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PREFACE TO THE FOURTH EDITION 


This fourth edition contains several additions. The main ones con- 
cern three closely related topics: Brownian motion, functional limit 
distributions, and random walks. Besides the power and ingenuity of 
their methods and the depth and beauty of their results, their importance 
is fast growing in Analysis as well as in theoretical and applied Proba- 
bility. 

These additions increased the book to an unwieldy size and it had to 
be split into two volumes. 

About half of the first volume is devoted to an elementary introduc- 
tion, then to mathematical foundations and basic probability concepts 
and tools. The second half is devoted to a detailed study of Independ- 
ence which played and continues to play a central role both by itself and 
as a catalyst. 

The main additions consist of a section on convergence of probabilities 
on metric spaces and a chapter whose first section on domains of attrac- 
tion completes the study of the Central limit problem, while the second 
one is devoted to random walks. 

About a third of the second volume is devoted to conditioning and 
properties of sequences of various types of dependence. The other two 
thirds are devoted to random functions; the last Part on Elements of 
random analysis is more sophisticated. 

The main addition consists of a chapter on Brownian motion and limit 
distributions. 

It is strongly recommended that the reader begin with less involved 
portions. In particular, the starred ones ought to be left out until they 
are needed or unless the reader is especially interested in them. 

I take this opportunity to thank Mrs. Rubalcava for her beautiful 
typing of all the editions since the inception of the book. I also wish to 
thank the editors of Springer-Verlag, New York, for their patience and 
care. 

M. L. 
Fanuary, 1977 
Berkeley, California 


PREFACE TO THE THIRD EDITION 


This book is intended as a text for graduate students and as a reference 
for workers in Probability and Statistics. The prerequisite is honest 
calculus. The material covered in Parts Two to Five inclusive requires 
about three to four semesters of graduate study. The introductory part 
may serve as a text for an undergraduate course in elementary prob- 
ability theory. 

The Foundations are presented in: 


the Introductory Part on the background of the concepts and prob- 
lems, treated without advanced mathematical tools; 

Part One on the Notions of Measure Theory that every probabilist 
and statistician requires; 

Part Two on General Concepts and Tools of Probability Theory. 


Random sequences whose general properties are given in the Founda- 
tions are studied in: 


Part Three on Independence devoted essentially to sums of inde- 
pendent random variables and their limit properties; 

Part Four on Dependence devoted to the operation of conditioning 
and limit properties of sums of dependent random variables. The 
last section introduces random functions of second order. 


Random functions and processes are discussed in: 


Part Five on Elements of random analysis devoted to the basic con- 
cepts of random analysis and to the martingale, decomposable, 
and Markov types of random functions. 


Since the primary purpose of the book is didactic, methods are 
emphasized and the book is subdivided into: 


unstarred portions, independent of the remainder; starred portions, 
which are more involved or more abstract; 

complements and details, including illustrations and applications of 
the material in the text, which consist of propositions with fre- 


PREFACE TO THE THIRD EDITION 


quent hints; most of these propositions can be found in the 
articles and books referred to in the Bibliography. 


Also, for teaching and reference purposes, it has proved useful to name 
most of the results. 

Numerous historical remarks about results, methods, and the evolu- 
tion of various fields are an intrinsic part of the text. The purpose is 
purely didactic: to attract attention to the basic contributions while 
introducing the ideas explored. Books and memoirs of authors whose 
contributions are referred to and discussed are cited in the Bibliography, 
which parallels the text in that it is organized by parts and, within parts, 
by chapters. Thus the interested student can pursue his study in the 
original literature. 

This work owes much to the reactions of the students on whom it has 
been tried year after year. However, the book is definitely more concise 
than the lectures, and the reader will have to be armed permanently 
with patience, pen, and calculus. Besides, in mathematics, as in any 
form of poetry, the reader has to be a poet in posse. 

This third edition differs from the second (1960) in a number of 
places. Modifications vary all the way from a prefix (“sub”’ martingale 
in lieu of ‘‘semi’’-martingale) to an entire subsection (§36.2). To pre- 
serve pagination, some additions to the text proper (especially 9, p. 656) 
had to be put in the Complements and Details. It is hoped that more- 
over most of the errors have been eliminated and that readers will be 
kind enough to inform the author of those which remain. 

I take this opportunity to thank those whose comments and criticisms 
led to corrections and improvements: for the first edition, E. Barankin, S. 
Bochner, E.. Parzen, and H. Robbins; for the second edition, Y. S. Chow, 
R. Cogburn, J. L. Doob, J. Feldman, B. Jamison, J. Karush, P. A. Meyer, 
J. W. Pratt, B. A. Sevastianov, J. W. Woll; for the third edition, S. 
Dharmadhikari, J. Fabius, D. Freedman, A. Maitra, U. V. Prokhorov. 
My warm thanks go to Cogburn, whose constant help throughout the 
preparation of the second edition has been invaluable. This edition has 
been prepared with the partial support of the Office of Naval Research 
and of the National Science Foundation. 

M. L. 
April, 1962 
Berkeley, California 
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Introductory Part 


ELEMENTARY PROBABILITY THEORY 


Probability theory is concerned with the mathematical analysis of 
the intuitive notion of “chance”’ or “randomness,” which, like all no- 
tions, is born of experience. The quantitative idea of randomness first 
took form at the gaming tables, and probability theory began, with 
Pascal and Fermat (1654), as a theory of games of chance. Since then, 
the notion of chance has found its way into almost all branches of knowl- 
edge. In particular, the discovery that physical “observables,” even 
those which describe the behavior of elementary particles, were to be 
considered as subject to laws of chance made an investigation of the 
notion of chance basic to the whole problem of rational interpretation 
of nature. 

A theory becomes mathematical when it sets up a mathematical 
model of the phenomena with which it 1s concerned, that 1s, when, to 
describe the phenomena, it uses a collection of well-defined symbols 
and operations on the symbols. As the number of phenomena, to- 
gether with their known properties, increases, the mathematical model 
evolves from the early crude notions upon which our intuition was 
built in the direction of higher generality and abstractness. 

In this manner, the inner consistency of the model of random phe- 
nomena became doubtful, and this forced a rebuilding of the whole 
structure in the second quarter of this century, starting with a formula- 
tion in terms of axioms and definitions. Thus, there appeared a branch 
of pure mathematics—probability theory—concerned with the construc- 
tion and investigation per se of the mathematical model of randomness. 

The purpose of the Introductory Part (of which the other parts of 
this book are independent) is to give “intuitive meaning” to the con- 
cepts and problems of probability theory. First, by analyzing briefly 
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some ideas derived from everyday experience—especially from games of 
chance—we shall arrive at an elementary axiomatic setup; we leave the 
illustrations with coins, dice, cards, darts, etc., to the reader. Then, 
we shall apply this axiomatic setup to describe in a precise manner 
and to investigate in a rigorous fashion a few of the “intuitive notions” 
relative to randomness. No special tools will be needed, whereas in 
the nonelementary setup measure-theoretic concepts and Fourier- 
Stieltjes transforms play a prominent role. 


I. INTUITIVE BACKGROUND 


1. Events. The primary notion in the understanding of nature is that 
of event—the occurrence or nonoccurrence of a phenomenon. The ab- 
stract concept of event pertains only to its occurrence or nonoccurrence 
and not to its nature. This is the concept we intend to analyze. We 
shall denote events by 4, B, C, --- with or without affixes. 

To every event 4 there corresponds a contrary event “not 4,” to 
be denoted by 4°; 4° occurs if, and only if, 4 does not occur. An event 
may imply another event: 4 implies B if, when 4 occurs, then B neces- 
sarily occurs; we write 4 C B. If 4 implies B and also B implies 4, 
then we say that 4 and B are equivalent; we write 4 = B. The nature 
of two equivalent events may be different, but as long as we are con- 
cerned only with occurrence or nonoccurrence, they can and will be 
identified. Events are combined into new events by means of opera- 
tions expressed by the terms “and,” “‘or” and “not.” 

A “and” B is an event which occurs if, and only if, both the event 4 
and the event B occur; we denote it by 4 NM Bor, simply, by 4B. If 
AB cannot occur (that is, if 4 occurs, then B does not occur, and if B 
occurs, then 4 does not occur), we say that the event 4 and the event 
B are disjoint (exclude one another, are mutually exclusive, are in- 
compatible). 

A “or” B is an event which occurs if, and only if, at least one of the 
events 4, B occurs; we denote it by 47 U B. If, and only if, 4 and B 
are disjoint, we replace “or” by +. Similarly, more than two events 


93 
e 


can be combined by means of “and,” “‘or’; we write 


A, 1 42N--- An or 41A_+++ An or awn 


k=1 


A,U A4,U-.--UA, or U &, 4A,+ fe+-:+a& or >, Ax. 
k=1 


k=1 


There are two combinations of events which can be considered as 


“boundary events’’; they are the first and the last events—in terms of 
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implication. Events of the form 4 -+ 4° can be said to represent an 
“‘always occurrence,” for they can only occur. Since, whatever be the 
event 4, the events 7 + 4 and the events they imply are equivalent, 
all such events are to be identified and will be called the sure event, to 
be denoted by Q. Similarly, events of the form 44° and the events 
which imply them, which can be said to represent a “never occurrence” 
for they cannot occur, are to be identified, and will be called the impos- 
sible event, to be denoted by @; thus, the definition of disjoint events 4 
and B can be written 7B = 9. The impossible and the sure events are 
“first” and “last”? events, for, whatever be the event 4, we have 9 C 
Aca. 

The interpretation of symbols C, =, M, U, in terms of occurrence 
and nonoccurrence, shows at once that 


if ACB, then B°C A’, and conversely; 
AB=BA, AUB=BU4; 
(AB)C = A(BC), (AUB) UC=AU (BUC); 
A(BUC)=ABU AC, AUBC=(4U BAU OC); 
(AB)° = 4° U BY, (AU BY = AB, AUB=ASAB; 
more generally 
(1 4) = U 4, CU 4n)° = 7 &’, 
k=l k=1 k=1 k=1 
and so on. 

We recognize here the rules of operations on sets. In terms of sets, 
Q is the space in which lie the sets 4, B, C, ---, @ is the empty set, A” 
is the set complementary to the set 4; 4B is the intersection, A U B 
is the uzion of the sets 4 and B, and 4 C B means that 7 is contained 
in B. 

In science, or, more precisely, in the investigation of “laws of nature,” 
events are classified into conditions and outcomes of an experiment. 
Conditions of an experiment are events which are known or are made to 
occur. Outcomes of an experiment are events which may occur when 
the experiment is performed, that is, when its conditions occur. All 
(finite) combinations of outcomes by means of “not,” “and,” “or,” are 


outcomes; in the terminology of sets, the outcomes of an experiment 
form a field (or an “algebra” of sets). The conditions of an experiment, 
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together with its field of outcomes, constitute a ¢rial. Any (finite) 
number of trials can be combined by “conditioning,” as follows: 

The collective outcomes are combinations by means of “not, 
“and,” “or,” of the outcomes of the constituent trials. The condi- 
tions are conditions of the first constituent trial together with condi- 
tions of the second to which are added the observed outcomes of the 
first, and so on. Thus, given the observed outcomes of the preceding 
trials, every constituent trial is performed under supplementary condi- 
tions: it is conditioned by the observed outcomes. When, for every 
constituent trial, any outcome occurs if, and only if, it occurs without 
such conditioning, we say that the trials are completely independent. 
If, moreover, the trials are identical, that is, have the same conditions 
and the same field of outcomes, we speak of repeated trials or, equiva- 
lently, identical and completely independent trials. The possibility of re- 
peated trials is a basic assumption in science, and in games of chance: 
every trial can be performed again and again, the knowledge of past and 
present outcomes having no influence upon future ones. 

2. Random events and trials. Science is essentially concerned with 
permanencies in repeated trials. For a long time Homo sapiens investi- 
gated deterministic trials only, where the conditions (causes) determine 
completely the outcomes (effects). Although another type of perma- 
nency has been observed in games of chance, it is only recently that 
Homo sapiens was led to think of a rational interpretation of nature in 
terms of these permanencies: nature plays the greatest of all games of 
chance with the observer. This type of permanency can be described 
as follows: 

Let the frequency of an outcome 4 in 2 repeated trials be the ratio 
na/n of the number v4 of occurrences of 4 to the total number x of 
trials. If, in repeating a trial a large number of times, the observed 
frequencies of any one of its outcomes 4 cluster about some number, 
the trial is then said to be random. For example, in a game of dice (two 
homogeneous ones) “‘double-six” occurs about once in 36 times, tnat 
is, its observed frequencies cluster about 1/36. The number 1/36 is a 
permanent numerical property of “double-six” under the conditions of 
the game, and the observed frequencies are to be thought of as measure- 
ments of the property. This is analogous to stating that, say, a bar 
at a fixed temperature has a permanent numerical property called its 
“length” about which the measurements cluster. 

The outcomes of a random trial are called random (chance) events. 
The number measured by the observed frequencies of a random event 


A is called the probability of A and is denoted by P4. Clearly, P® = 0, 


>? 
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PQ = 1 and, for every 4,0 S$ P4 S 1. Since the frequency of a sum 
Ay + Ao +---+ Ap of disjoint random events is the sum of their fre- 
quencies, we are led to assume that 


Furthermore, let 74, “8, “4B be the respective numbers of occurrences 
of outcomes 4, B, 4B in n repeated random trials. The frequency of 
outcome B in the vy trials in which 4 occurs is 


MAB MAB NA 


nA n n 


and measures the ratio P4B/PA, to be called probability of B given A 
(given that 4 occurs); we denote it by P4B and have 


PAB = PA-P4B. 


Thus, when to the original conditions of the trial is added the fact that 
A occurs, then the probability PB of B is transformed into the proba- 
bility P4B of B given 4. This leads to defining B as being stochasti- 
cally independent of A if PaB = PB or 


PAB = PA-PB. 
Then it follows that 4 is stochastically independent of B, for 


PAB 
Pad = —— = PA, 
PB 
and it suffices to say that 4 and B are stochastically independent. (We 
assumed in the foregoing ratios that the denominators were not null.) 

Similarly, if a collective trial is such that the probability of any out- 
come of any constituent random trial is independent of the observed 
outcomes of preceding constituents, we say that the constituent ran- 
dom trials are stochastically independent. Clearly, complete independ- 
ence defined in terms of occurrences implies stochastic independence 
defined in terms of probability alone. Thus, as long as we are concerned 
with stochastic independence only, the concept of repeated trials re- 
duces to that of identical and stochastically independent trials. 

3. Random variables. For a physicist, the outcomes are, 1n general, 
values of an observable. From the gambler’s point of view, what 
counts is not the observed outcome of a random trial but the corre- 
sponding gain or loss. In either case, when there is only a finite num- 
ber of possible outcomes, the sure outcome @ is partitioned into a num- 
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ber of disjoint outcomes 4), 42, ---; 4m. The random variable X, say, 
the chance gain of the gambler, is stated by assigning to these outcomes 
numbers *4,, “4. ‘°°; “4,» which may be positive, null, or negative. 
The “‘average gain” in ” repeated random trials is 


Since the trial is random, this average clusters about *4,P4, + *4,PA2 
+---+%,4 PA, which is defined as the expectation EX of the random 
variable X. It is easily seen that the averages of a sum of two random 
variables X and Y cluster about the sum of their averages, that is, 


E(X+ Y) = EX + EY. 


The concept of random variable is more general than that of a random 
event. In fact, we can assign to every random event 4 a random vari- 
able—its indicator J4 = 1 or O according as 4 occurs or does not occur. 
Then, the observed value of J, tells us whether or not 4 occurred, and 
conversely. Furthermore, we have EJ, = 1-P4+0-PH° = PA. 

A physical observable may have an infinite number of possible values, 
and then the foregoing simple definitions do not apply. The evolution 
of probability theory is due precisely to the consideration of more and 
more complicated observables. 


II. AXIOMS; INDEPENDENCE AND THE 
BERNOULLI CASE 


We give now a consistent model for the intuitive concepts which ap- 
peared in the foregoing brief analysis; we shall later see that this model 
has to be extended. 

1. Axioms of the finite case. Let Q or the sure event be a space of 
points w; the empty set (set containing no points w) or the impossible 
event will be denoted by @. Let @ be a nonempty class of sets in Q, to 
be called random events or, simply, events, since no other type of events 
will be considered. Events will be denoted by capitals 4, B, --- with 
or without affixes. Let P or probability be a numerical function de- 
fined on @; the value of P for an event 4 will be called the probability 
of A and will be denoted by P4. The pair (@, P) is called a probability 
field and the triplet (Q, @, P) is called a probability space. 


nr 
Axiom I. @ is a field: complements 4°, finite intersections () 4;, 
k=1 


n 
and finite unions (J 4; of events are events. 
k=1 


Axtom IJ. Pon @ 1s normed, nonnegative, and finitely additive: 


PQ=1, P4420, PY A= >> PA. 


k=1 k=1 
It suffices to assume additivity for two arbitrary disjoint events, since 
the general case follows by induction. 
Since @ is disjoint from any event 4 and 4+ @ = J, we have 
PA = P(A+9) = PA+ PQ, 


so that PO = 0. Furthermore, it is immediate that, if 7 C B, then 
PA s PB, and also that 


PU & = PA, + PAY Ag +-+>+ PAYA! +++ An1°An S > PAx. 
k=1 


k=1 
The axioms are consistent. 
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To see this, it suffices to construct an example in which the axioms 
are both verified: take as the field @ of events 2 and @ only, and set PQ 
= 1, P@=0. A less trivial example is that of a simple probability field: 
1° The events, except 9, are formed by all sums of disjoint events 4, 
Ag, +++, An which form a finite partition of the sure event: 4; + de 
+---+ 4, =; 2° to every event 4, of the partition is assigned a 


probability p, = PA; such that every p, 2 O and >> p, = 1—this is 
k=1 


always possible. Then P is defined on Q@, consistently with axiom II, 
by assigning to every event 4 as its probability the sum of probabilities 
of those 4, whose sum is 4. 

2. Simple random variables. Let the probability field (@, P) be 
fixed. In order to introduce the concept of random variables, it will be 
convenient to begin with very special ones, which permit operations on 
events to be transformed into ordinary algebraic operations. 

To every event 4 we assign a function J4 on 2 with values J4(w), 
such that Z4(w) = 1 or 0 according as w belongs or does not belong to 
A; I4 will be called the zndicator of A (in terms of occurrences, [4 = 1 
or 0, according as 4 occurs or does not occur). Thus, 24? = J, and 
the boundary cases are those of J, = 0 and Jp = 1 (if, in a relation 
containing functions of an argument, the argument does not figure, 
then the relation holds for all values of the argument unless otherwise 
stated). 

The following properties are immediate: 


if ACB, then J4 S Jp, and conversely; 

if d= 8B, then J4 = Jp, and conversely; 

Igc =1—JLa, Lap = TJIalp, Lage = La t+ Ip, 
Taus = Lasace = La + Ip — Lap 


and, more generally, 


Ln = [[ lap In = di la, 
k=1 


Ak = Ak 
A, k=l mm 


vf n = I4, + (1 — T4,)La, te°++ (1 — I4,) vee (1 — Ig, plan 
os 
Linear combinations X = }° x;J,4, of indicators of events 4; of a finite 
j=l 
partition of 2, where the x; are (finite) numbers, are called simple 
random variables, to be denoted by capitals X, Y, ---, with or without 
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affixes. By convention, every written linear combination of indicators 
will be that of indicators of disjoint events whose sum is the sure event; 
however, when x; = 0, we may drop the corresponding null term «;J4, 
= 0 from the linear combination. The set of values PZ; which corre- 
spond to the values x; of X, assumed all distinct, 1s called the proba- 
bility distribution and the 4; form the partition of X. The expectation EX 


of a simple random variable X = x x;la; is defined by 


j=l 


EX = » xj;PA;. 
j=l 
Clearly, any constant ¢ is a simple random variable, and the sum or the 
product of two simple random variables is a simple random variable; 
E(e) = ¢c, EcX = cEX; if X 2 0, that is, all its values x; 2 0, then 
EX 20; if X S Y, then EX S EY. Furthermore, expectations pos- 
sess the following basic property. 


ADDITION PROPERTY. The expectation of a sum of (a finite number of) 
simple random variables is the sum of their expectations. 


It suffices to prove the assertion for a sum of two simple random vari- 
ables 


X = D *;L aA; Y= Dd Del Bys 


j=1 k=1 


since the general case follows by induction. Because of the properties 
of probabilities and indicators given above, 


EX + EY = Q) xjPAj + Di yePBe= Ld (xj + Yn) PAGBe 


j=1 k=l j=l k=1 
while 


E(X+ Y)=ED) Li + ye)Laypn 


j=1 k=1 


3 


=D) Ly + ne) PABr. 
1 k=1 


J 


and the conclusion is reached. 
Application to probabilities of combinations of events. To begin with, 
we observe that 
El, =1-P4A+0-P4* = PA. 
Therefore, from 
Igus =1a+ Is — Lap 
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it follows, upon taking expectations of both sides, that 
P(A U B) = PA+ PB — PAB. 

Similarly, from 

Igusuce = Ja t+ (1 — Lalla + 1 — La) — I) 
it follows, upon expanding the right-hand side and taking expectations, 
that 
P(AU BUC) = PA+ PB+ PC — PAB — PBC — PCA + PABC, 
and so on. 


The foregoing properties of expectations lead to the celebrated 


TCHEBICHEV INEQUALITY. If X 15 a simple random variable, then, for 
every e > QO, 


1 
P|| X| 24 <= EX’. 
€ 


| X| = ¢] is to be read: the union of all those events for which the 
values of | X| are 2 «. 
The inequality follows from 


EX? = E(X*lyxjzq) + E(X*Qyxy<q) 2 E(X*Iyxjzq) 2 PET xz 4 
= e’P|| x | = el. 


3. Independence. Two events 4;, 42 are said to be stochastically 
independent or, simply, independent (no other type of independence of 
events will be considered) if 


PA\A2 = PA\P Ap. 


More generally, events 4,, k = 1, 2, +--+, m are independent, if, for every 
m <n and for arbitrary distinct integers ki, ko, ---; km S 7 


PA, An, ose A ten = PA, P Ap, 7s. PA,,,.- 


If this property holds for all events 4; selected arbitrarily each within 
a different class @,, we say that these classes are independent. Simple 
random variables X;, k = 1, 2, ---, ”, are said to be independent if the 
partitions on which they are defined are independent. A basic prop- 
erty of independent simple random variables is the following 


MULTIPLICATION PROPERTY. The expectation of a product (of a finite 
number) of independent simple random variables is the product of their 
expectations. 
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It suffices to give the proof for two independent simple random variables, 


X= Dd «jla;, Y= dX yelpyy all x;(y,) distinct, 
k=l 


j=l 
since the general case follows by induction. Because of independence, 


EXY = EY Dxiyela, = DL x*iyePAjPBy 


j=1 k=1 j=l k=1 
= ()) *jPAj)( 0 ynPBr) = EXEY, 
j=1 k=1 


and the conclusion is reached. 
The expectation E(X — EX)’, called the variance of X, is denoted 
by o7X. By the additive property, © 


o°X = E(X? — 2XEX + E2X) = EX? — E2X. 


The celebrated Bienaymé equality follows from the additive and mul- 
tiplicative properties. 


BIENAYME EQUALITY. If X;,, k = 1, 2, +--+, 2, are independent, then 
nr nr 
o” Xn = 07? Xy. 
k=1 k=1 


E(X, — EX;) = EX, — EX, =0 


Since 


and independence of the X; implies independence of the X, — EX;, it 
follows that 
o DX, = E( DX, — DEX)? = El © (X, — EX;)}? 
k=1 k=1 k=1 


k==1 


= >) EX, — EX,)? + DY E(X) — EX;)(X, — EXz) 
k=1 


jwk=l1 


o7 Xi, + > E(X; — EX/EX, — EX;) = > o* Xk. 


1 jw#k=1 k=1 


k= 
Observe that we used independence of the X; considered two dy two 
only. 

4. Bernoulli case. A simple case of independence has played a cen- 
tral role in the evolution of probability theory. This is the Bernoulli 
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case of events 4;, k = 1, 2, ---, which are independent whatever be 
their total number 7” under consideration and such that their probabili- 
ties PA; have the same value p. 

We observe that independence of the 4,, k = 1, 2, ---, ” implies 
independence of the Ay, = Ay or A,°, and, more generally, of the z 
fields @, = {@, dz, 4°, 2}. For example, 


PA, A ks vee A komm = PA, Ar, vee Ar, PA, Abs ‘ee Ar, 
= PAP Ay, -*+ PA, — PAy,PAn, +++ PA ny, 
= (1 — PA,,)PAy, ++: PA,,, = PA, SPA, +++ PAys 


where the subscripts are all distinct and < ”. These fields correspond 
to repeated random trials where an outcome 4 at the Ath trial is repre- 
sented by 4x. 

The number of occurrences of outcome 4 in 7 repeated trials is rep- 


resented by a simple random variable S, = >) J4,. To write 5, in the 
k=1 


usual form, that is, with values assigned to events of a partition of the 
sure event, we observe that 


Ta, = La, UL Ga; + 143). 
j=l 
jak 


It follows, upon substituting in S, and expanding, that 


Sn = » jLB;y 
j=0 
where 


Te, = DLay ++ Tan, Lan, 0 0+ Lane 


n 


The summation is over all permutations of subscripts k = 1, 2, ---, 7, 
classified into two groups, one having 7 terms and the other having 
n — 7 terms. 

On account of the independence, the expectations of the terms under 
the summation sign are 


PA, PA, eee PAy,PAy ° eee PA,,° = pig 4, q = ] —~ Dy 
and, therefore, the probability of 7 occurrences in 7 trials is given by 
n! 


P[Sn = | = PB; = ——_—_— pig", J = 0, I, "ety 7. 
i\(n — 7)! 
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With this result we can compute directly the expectation and variance 
of S,, but we prefer to use the additive property which gives 


ES, = E> La, = >», PA; = ND, 
k=1 k=1 
and the Bienaymé equality for independent random variables J,, which 


gives 
n 


oS y = > oT A, = Pq, 
k=1 


oT 4, = Ea, — EI,,)° = EI,” —_— E*J 4, 


since 


= Ely, — E*I4, = p — p” = 4. 


In order to justify the model investigated so far, we ought to give a 
precise and acceptable “‘meaning” to the notion of “‘clustering of fre- 
quencies” which, as we have seen, is at the very root of the interpreta- 
tion of randomness. The most celebrated interpretation, and rightly so, 
is the following 


BERNOULLI LAW OF LARGE. NUMBERS (1713). Jn the Bernoulli case, 
for everye > QO, asn — @, 


P| 


In other words, the probability distribution of values of the frequency 
S,/n of an outcome in 7 repeated trials concentrates at the value p of 
the probability of the outcome, as the number of trials increases in- 
definitely. 

The proof is immediate for, upon applying the Tchebichev inequality, 
we have, as 7 —> ©, 


| 


Observe that only independence two by two has been required. 


Sn 
*— plz] 0. 
1 


Sn 
**—»|z «|= Pils, — £5,|2 os 
nN 


en en 


A particular sequence of Bernoulli cases, introduced by Poisson, 
shows that the finite setup considered so far is not satisfactory, at least 
from the sophisticated mathematician’s point of view. 

Consider a sequence of Bernoulli cases of independent events nx, 
k=1,2,--:,”;” = 1, 2, ---, of the same probability p, which varies 
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with the number z of trials in such a manner that the expectation of 
n 


the number of occurrences S, = )) J4,, remains constant: ES, = npn 
k=1 


=}. Then, as x — © while j remains fixed, 
P{Sn = J 
n! 


_ _ psy nj 
ja — pen 


=n Den FE DAY (2 


and we have the following 


PoIssON THEOREM (1832). If Sn = D0 I4,, is the sum of indicators of 
k=1 


independent equiprobable events, such that the expectation ES, => 0 
remains constant as n varies, then, asn —-> ©, 


’ 
P[Sn = J] ae , j = 0, 1, 2, 
Since 
re) I at] 7 
> ee = a i I, 
j=0 J! j=0 J! 


we can say that, in the foregoing passage to the limit, no positive proba- 
bility escapes to infinity. The total probability is now distributed 
among a denumerable number of values 7 = 0, 1, 2, ---, provided we 
assume that the probability of the sum of a denumerable number of 
disjoint events [5,, = j] is the sum of their probabilities. However, in 
the setup of § 1 neither a denumerable sum of events nor the property 
just stated has content. Thus, if we want to give an interpretation to 
Poisson’s result, we have to expand the model so as to include the pre- 
ceding possibilities. 

5. Axioms for the countable case. As soon as the concept of infinity 
appears, intuition fails and the vague everyday idea of randomness 
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yields nothing. A first and obvious way to pass from the finite to the 
infinite 1s to extrapolate, that is, to postulate that properties of the 
finite case continue to hold in the infinite case. Yet these extrapolations 


have to be meaningful and consistent. 
(ea) (o@] 


In set theory, intersections () 4, and unions J 4, of sets 4p, where 
n=1 n=] 
m runs over the denumerable set of integers, continue to be defined as 
the sets of points which belong to every 4, and to at least one 4p, re- 
spectively. We still have that 


( a An)° a U An°, ( U An)° = ‘al An‘, 
n=1 n=1 n=] n=1 


U An = Ay + Ay A» + Ay° Ao As + see ad infinitum 


n=1 


and, correspondingly, 


I ("e) = II Za, I (ora) = > La, 
.) An n=l >> An n=1 
n=] n=1 
ts = 14, + Layla, + Laylayla, +: 
An 
n=1 


If we want all countable (finite or denumerable) combinations of events 
by means of “‘not,” “‘and,” “or,” to be events and their probabilities 
to be defined, then axioms I and II become 


Axiom I’, Events form a o-field @: Complements 4°, countable in- 
tersections (} 4;, and countable unions LJ 4; of events are events. 


Axiom IT’ Probability P on @ ts normed, nonnegative, and o-additive: 
PQ=1, PAz=O, PY; = 2, Pd. 
It follows that 
CovERING RULE: P U A; = PM, + PAA, + PAA Ag +>: 
< ye PA;,. 
These axioms are consistent, since the examples constructed for the 


finite case continue to apply trivially. A nontrivial example in the in- 
finite case is that of nonsimple elementary probability fields: 1° The 
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events, except 9, are formed by all countable sums of events 4, which 


form a denumerable partition of the sure event: >> 4, = Q; 2° to 
n=1 


every event 4, of the partition is assigned probability p, = PA, such 


that every p, 2 0 and >> J, = 1—this is always possible. Then P is 
n=1 


defined on @, consistently with axiom If’, by assigning to every event 
A as its probability the sum (finite sum or convergent series) of proba- 
bilities of those 4, whose sum is 4. 

6. Elementary random variables. A linear combination X = 
>, «1,4; of a countable number of indicators of disjoint events 4; is an 


j 
elementary random variable X; if 7 varies over a finite set, then X re- 
duces to a simple random variable. Clearly a sum or a product of two 
elementary random variables is an elementary random variable. We 
may still try to define the expectation EX by 


EX = Yi x;P4;. 
j 


But, if the sum is a divergent series, it has no content or is infinite. 
Furthermore, even if it is a convergent series, it may not be absolutely 
convergent, so that by changing the order of terms we can change its 
value, and the expectation is no longer well defined if no ordering is 
specified; this is undesirable according to the very meaning of an ex- 
pectation. We are therefore led to define EX by the foregoing expres- 
sion only when the right-hand side is absolutely convergent, so that 


if EX exists and 1s finite, then EZ xX | exists and 1s finite; and conversely. 


(We recognize here an integrable elementary function in the sense of 
Lebesgue with respect to the measure P.) 

The argument used to prove the addition property of simple random 
variables continues to apply to finite sums of elementary random vari- 
ables whose expectations exist and are finite, provided o-additivity of 


P is used. We obtain: 


If the expectations of a finite number of elementary random variables 
exist and are finite, then the expectation of their sum exists and is finite 
and 1s the sum of thetr expectations. 


Also, Tchebichev’s inequality remains valid, provided its right-hand 
side exists and is finite. 

Independence of a countable number of events 4;, or o-fields @; con- 
tained in Q, is defined to be independence of every finite number of 
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these events, or o-fields. Independence of a countable number of ele- 
mentary random variables X;, = )) x;,/4;,1s defined to be independence 


27 
of every finite number of events 4;,, as &k varies. The argument used 
to prove the multiplication property yields: 


If the expectations of a finite number of independent elementary random 
variables exist and are finite, then the expectation of their product exists 
and 1s the finite product of their expectations. 


Also, Bienaymé’s equality remains valid, provided its right-hand side 
exists and is finite. 


In the Bernoulli law of large numbers only simple random variables 
figure and only finite additivity of the probability P is used, so that 
nothing is to be changed. However, now we can introduce probabilities 
of denumerable combinations of events and use the supplementary re- 
quirement that the additive property of P remains valid for denumera- 
ble sums. Therefore, in the present setup we can expect a more pre- 
cise interpretation of the “clustering of frequencies.” This is the cele- 
brated Borel strong law of large numbers derived below. 

Let X,, Xo, --- be a sequence of elementary random variables. We 
investigate the convergence to 0 of the sequence; the limits are taken 
as n —> ©, It will be more convenient to consider the contrary case— 
Xn does not converge to 0 or, equivalently, there exists at least one in- 
teger m such that to every integer 7 there corresponds at least one in- 


; l ; 
teger v for which | Xn4,| = a Since ‘“‘at least one’”’ corresponds to 


“1 ]”? while “every” corresponds to “‘(),’”’ we can write 
y ’ 


io] i+ ¢) ioe] ] 
[Xn b O] = U ‘al U [Ean =—|; 
m=1 n=] v=1 m 

the right-hand side is an event. Thus, the condition X, + 0 deter- 
mines the event [X, + 0], the contrary condition X, — O determines 
the complementary event [X, — O], and the probabilities of these two 
events add up to l. 

We are interested in X, — O with probability I or, equivalently, 
Xn ++ 0 with probability 0, and require the following proposition. 


° 1 
If, for every integer m, >) P | Xn | = | <0, then P(X, b Oj] = 0. 
m 


n=] 


eo 1 oO 
We set dam = U [ Xn |= —| and 4m = () Anm and observe that, 
m 


y=] n=l 
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by the covering rule and the hypothesis, for every m, 


° 1 
PAnm = P U [| xel = —| 
™m 


k=n-+1 
° 1 
< > P\| Xl = =] —-0O as no, 
=n-+1 ™ 
Since whatever be 7’, 


PAm = P (\ Anm S PAnm- 


n=1 


it follows upon letting 7’ — © that P4,, = 0. Therefore, by the cov- 
ering rule 


P[Xn # 0) = PU 4n S Dd PAn = 0 


m=l1 m=" 


and the proposition 1s proved. 
We can now pass to 


BorEL’S STRONG LAW OF LARGE NUMBERS (1909). In the Bernoulli 


case 
Sn 

P = > »| = 1. 
n 


We recall that in the Bernoulli case 


where the 4; are independent events of common probability p whatever 
be 2, and EX, = p, o*7Xn = pq/n (observe that only independence two 
by two is used). Since for every m 


0 1 a | 
© P|| Xe - 21 2—| smd G<o, 
k=1 m k= 


it follows by the foregoing proposition that X;,z2 — p with probability 1 
as k —> o. But to every 7 there corresponds an integer k = k(z) with 
kin < (k+ 1)*; hence O Sn — #* S 2k and n — ~ implies k > ~. 
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Since 
1 1 K 1 ” 
| Xn — Xz = 9 » la; + » La; 
nN k j=l Mo j=k?+1 
(n— k*)k? n—k? 4 
< s- 
nk? n k 
so that 


4. 
| Xn —p| S| Xn — Xe| +] Xe—p| SF + | Xe — a, 


it follows that X, — p with probability 1 as 2 — ©, and Borel’s re- 
sult is proved. 

* Application. Let X be an elementary random variable. We set 
F(x — 0) = F(x) = PIX < x], F@ +0) = P[X S x] so that PLX = 
x] = Fw +0) — #(@). The function / so defined determines the prob- 
ability distribution of X, that is, the probabilities of all values of X; 
it is called the distribution function of X. We organize repeated inde- 
pendent trials where we observe the values of X; in other words, we 
consider independent random variables Xi, Xo, --- with the same prob- 
ability distribution as X. 

If k is the number of values observed in 7 of those trials and which 
are less than x or, equivalently, if & is the number of independent events 
[X1 < x], [Xo < x], +--+ [Xn < «] (with common probability p = F(x)) 
which occur, we set F,(« — 0) = Fa(x) = k/n. Thus, Fn(x) is a ran- 
dom variable with 


P| Fa) ==] = ret — Fey 

n = =e x —_— ° 

al kab! . 

The function F, is called empirical distribution function of X in n trials. 
According to Borel’s strong law of large numbers, this frequency F(x) 
of occurrences of the outcome [X < x] converges to F(x) with proba- 
bility 1. In other words, the observations permit us to find with prob- 
ability 1 every value F(x) of the distribution function of X. In fact, 
Borel’s result yields more (Glivenko-Cantell1): 


CENTRAL STATISTICAL THEOREM. If F ts the distribution function of 
a random variable X and F,, 1s the empirical distribution function of X in 
n independent and identical trials, then 
P{ sup | Fra(x) — F(x) | > 0] = 1. 
<z<+o 


a) 


In other words, with probability 1, F,(«) — F(*) uniformly in «. 
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Let xj, be the smallest value x such that 


P(e) $7 s Fe + 0). 


Since the frequency of the event [X < x;4] is Fn(«j,) and its probability 
is F(x;), it follows by Borel’s result that P4y, = 1 where 4, = [Fa (xjx) 
— F(x;,)]. Similarly, P4, = 1 where 4, = [Fax + 0) — Fey, + 
0]. Let Aj, = 4n4j, and let 9 = +0 


k 
Ay = (| 4j, = | sup | F'n(xjn + 0) — Px je + 8) | — OQ]. 
j=1 1sjsk | 


By the covering rule and by what precedes 
ke ke 
PAY = PU 43° S > PA = 0 


j=1 j=1 
and, hence, P4, = 1. Upon setting 7 = A An, it follows similarly 
that P4 = 1. — 
On the other hand, for every « between xj, and xj11,% 
F(x, +0) S P(x) S F(ej41,4), Pn Qin +0) S Pa) S Paj4s.x) 
while for every «jz 


0S P(xj4in) — Fe +9) S 


a | 


Therefore, 

F(x) — F(x) S Fa(sjisn) — Pm +9) S Pajain) — P@j41.e) + 

and 

F(x) — F(x) 2 Fae + 0) — Fj41,5) , 
= Fa(xje + 0) — i + 0) — E 

It follows that, whatever be w and &, 


1 
| Fn(x) — F(x) | S sup | Falxjn + 0) — Fey, + 0) | + ; 


or 


1 
An = sup | Fa(x) — F(x) | S sup | Fae + 6) — Flea + 0) | +7- 
” Sjs 


—acr<ct 
Hence P[A, — 0] 2 PA = 1, and the theorem is proved. 


*REMARK. The foregoing proof and hence the theorem remain valid 
when the random variable X is not elementary. 
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7. Need for nonelementary random variables. The sophisticated 
mathematician prefers to work with “closed” models—such that the 
operations defined for the entities within the model yield only entities 
within the model. While elementary random variables can be obtained 
as limits of sequences of simple random variables, a// limits of se- 
quences of simple and, more generally, of elementary random variables 
are not necessarily elementary—families of elementary random varia- 
bles are not necessarily closed under passages to the limit. If this clo- 
sure is required, then the concept of a random variable has to be ex- 
tended so as to include “measurable functions.” This will be done in 
the following parts. In fact, the need for further expansion of the model 
in order to include random variables with a noncountable set of values 
appeared quite early in the development of probability theory, once 
more in connection with the Bernoulli case. This is the celebrated (as 
the reader observes, all results obtained in or used for the Bernoulli 
case are “celebrated’’) 


De Motivre-Lapiace THEOREM. In the Bernoulli case with p > 0, 
gq=1—p>0,asn > ~, 


de Moivre (1732): 
Pi(*) = P[Sn = J] ~ 


1 2 —2°/2 J — np 
V lanpg 
uniformly on every finite interval |a, b] of values of x; 

Laplace (1801): 
Sn — 1 b 
P| a <7" < | — —{ eWay, 
npg V/ 2x Ja 

The relation a, ~ 5, means that 4a,/6, — 1. The integer j varies 
with #, so that « = x(7) remains within a fixed finite interval [a, 5] and 


j=np+xV nq > 2%, kR=n—j=nqg—x«V nq > ~. 
We apply Stirling’s formula 


1 
m\ = VIaem-m™e—™en, 0 < Om << —— 
12m 


| 
to the binomial probabilities Pa(x) = 7 pigt. Thus 
1°R3 
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Le me 
P,(x) = V 2an-n"e j hn —0)—O 


; ; é 
\/ Inj jie ix Ink Kee *? 4 


ee (“2 (“2)¢ 
Jr Nik\ i /\k 


where, uniformly on [a, 4], 


JR ( 22) ( 22) 
— =n\ pre .J—)\9-—* ,/—)~ "pg: 
nN n nN 


Therefore, uniformly on [a, 4], 


1 a (“*) 
P,(*) ~ ———= | — ] |— 
) —e ( j/\k 
and 
j k 1 2 1 
(23 (= oor evanefE—2e (8) 
j k np np n” 
1 px? 1 
— (ng — *V npg) |-« [7 -;" +0(=)| 
nq 2 n¢q n? 


--40(). 


The first assertion follows. 


and 


—7n 
Let «nj; be those numbers of the form 2 P which belong to the in- 


V "pq 


terval [a, 4]; consecutive %,;’s differ by 1/V zpg. On account of the 
first assertion, uniformly in J, 


] , 
Pal tqj) ~ oo ote"? 
and 2anpg 
Plas 2a? <5] = Era) een 
a< <$|= a(Xng) Yee eee? , 
/npq j “2 npg 


Since the last expression is a Riemann sum approximating the integral 


b 
— 72 . 
— f e~*/* dx, the second assertion follows. 
a 
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1. Conditional probabilities. Let 4 be an event with P4 >0. The 
ratio P4B/PA 1s called the conditional probability of B given A or, 
simply, probability of B given A and is denoted by P4B, so that 


PAB = PAP4gB. 
By induction we obtain the multiplication rule: 
P(AB --- KL) = PAP4B --- Pag..-xL. 
Furthermore, if )) 4; = Q, then, from 
"PB = PoB = DPA, 

follows the total probability rule: 

PB = >) PA;P 4,B. 
Bayes’ theorem, ; 


PA,P 4,B 
A. = ——* _, 

eBay > PA;P 4,B 
Jj 


follows upon replacing PB by the foregoing expression in the relation 
PA,B = PA,P4,B = PBPgA,. 


All events which figure as subscripts are supposed to be of positive 
probability. However, if, say, PaB is given, then every given PA, whether 
zero or not, determines correctly PAB by PAB = PAP4,B, since PA = 0 
implies PAB = Q. 

The set of all probabilities of events given a fixed 4 with P4 > 0 
defines a function P,4 on @, to be called the conditional probability given 
A or, simply, the probability given A. It follows at once from the defi- 
nition that P4 obeys axiom II’: it is normed, nonnegative, and o-addi- 
tive on @. Therefore, the pair (@, P4) is a probability field given 4 
for which all definitions and general properties of probability fields re- 

24 
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main valid. In particular, if X = >) x4; 1s an elementary random 


J 
variable, the expectation of X with respect to Py, or the conditional ex- 
pectation of X given A or, simply, the expectation of X given A is defined 
by 1 
E,x = >» x;P 44; = — » x;PAA;; 
j PA; 


clearly, if E.X exists and is finite, then E4X exists and is finite. In 
terms of trials, the probability field given 4 represents the original 
trial with the occurrence of outcome 4 added to the original conditions. 

It is easily verified that the events 4; of a countable set are inde- 
pendent if, and only if, for every finite subset 71, ja, ---, jx of indices 


Paa. Ay (Aix) = PA;,, 


41° «22 7 


provided the “given” events have positive probability. 
2. Asymptotically Bernoullian case. Let 4,, 2 = 1, 2, ---, be an 


, 1 2 
arbitrary sequence of events, and let X, = a >, 4, be the random fre- 
k=1 


quency of occurrence of the 7 first ones. We set 
1” 2 
pila) =-— DPA, pelt) =———_ PAA; 
“1 k=1 n s 


so that p1(7) and po(”) are bounded by 0 and 1. It follows, by elemen- 
tary computations, that 


EXn = pi”), 0° Xp = po(m) — p17(n) + 
In the Bernoulli case 
dn = po(n) — pr?(n) = p? — p? = 0, 


and we can consider the quantity d, as some sort of measure of “‘devia- 
tion” from the Bernoulli case. To make this precise, let us first prove a 


piln) — Palm) 
n 


KoLmMoGorRov INEQUALITY. Jf X is an elementary random variable 
bounded by 1 (in absolute value), then, for every « > Q, 


Pi X| = = EX? - ¢. 


We proceed as for the proof of Tchebichev’s inequality: the inequality 
follows from 


EX* = E(X*Iyxiza) + E(X*Iyxi <a) S Eluxiza + & 
= P| xX|2qd+e. 
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EXTENDED BERNOULLI LAW OF LARGE NUMBERS. Bernoulli's result, 
that for every « > O 
P{| Xn — EX, | = «] — 0, 


remains valid for the sequence of events An, independent or not, if, and 
only if, 
dn = po(n) — pi°(n) — 0. 


Since | X, | S$ 1, we can apply Kolmogorov’s inequality as well as Tche- 
bichev’s, so hae 
aX, —& S Pll X, — EXn| =  S 0° X,/é. 
Therefore, the asserted property holds if, and only if, «7X, 7 0. But 
— 1 
| «7X, — da | ON Fait, 
n n 


and the extension follows. 
If d, — 0 at least as fast as 1/n, then (asymptotically) we are even 
“closer” to the Bernoulli case. In fact, 


ExTENDED BOREL STRONG LAW OF LARGE NUMBERS. Jf d, = O(1/n), 
then Borel’s result remains valid: 


P[X, — EX, > 0] = 1. 


The hypothesis means that there exists a fixed finite number ¢ such that 
| wdn| Sc. Upon referring to the proof of Borel’s result, we observe 


that it suffices to show that >> o*Xj2 < ». Since 
k=1 


no" X, s | nd, | a | 21(7) — po(7) | i I, 


it follows by setting 2 = &? that 
0 i?) i 
ToP'Xe S$ (e+ 1) 5<-, 
k=l kai & 
and the extension follows. 
It is easily shown that both extensions apply to the: events 4, which 
are independent but otherwise arbitrary. 
3. Recurrence. The decomposition 


a’ Xn = palm) — pi*(2) + oa 


yields at once a proposition which leads very simply to the celebrated 
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Poincaré’s recurrence theorem and its known refinements. Since 
o?X, = 0 and p;(), po(m) are bounded by 0 and 1, it follows that, for 
any fixed e > 0, if m 2 1/e, then 


po(n) = p2(n) + po(n) — et in 


1 
a Xn = pi'(n) — -— = pi®(n) — «. 
n 
But f2(7) is the arithmetic mean of P4;4, forl Sj <k Sn. There- 
fore, 


Whatever be the events An, if n = 1/¢, then there exist at least two events 
A;, Ay 1 Si <k Sn, such that PA;A, = py*(n) — «. 


In particular, if P4, 2 p > 0 whatever be », then every subsequence 
of these events contains at least two events 4;, 4, such that P4;4;, = 
p” — e; if this inequality holds, we say that 4; “‘e-intersects” 4,. In 
fact, there exists then a subsequence whose first term e-intersects every 
other term. For, if there is no such subsequence, then there exist inte- 
gers m, such that no event 4, e-intersects events 4, with 2’ = n + mn, 
no two events of the subsequence 4,,, Ano Any °° With my = 1, mg = 
ny + my, 13 = No + my, +--+, €Intersect, and this contradicts the par- 
ticular case of the foregoing proposition. Thus, let 411;, 401, 431, °°; 
be a subsequence such that the first term e-intersects every other term. 
Let Ajo, A, A329, vty be a subsequence of Aa, A31, vty with same 
property, and so on indefinitely. The sequence 411, 412, +++, is such 
that every one of its terms e-intersects every other term. Hence 


RECURRENCE THEOREM. Jf PA, 2 p > O whatever be n, then for every 
e > 0 there exists a subsequence of events An such that PAj;A, = p* — «€ 
whatever be the terms Aj, Ay of this subsequence. 


We observe that, if P4, = p, then P4;4;, 2 p® — ¢ while, if the 4, 
are two by two independent, then P4;4, = p®. Thus, however small 
be e > 0, for every sequence 4, of events, independent or not, there 
exists a subsequence which behaves as if its terms were two by two 
semi-independent up éo e (“‘semi” only since we do not have necessarily 
PA;Ay me p? + é). 

A phenomenological interpretation of the foregoing theorem is as 
follows. Consider integer values of time and an incompressible fluid 
in motion filling a container of unit volume. Any portion of the fluid 
which at time 0 occupies a position 4 of volume P4 = p > 0 occupies 
at time m a position 4, of same volume P4, = p. The theorem says 
that, for every « > 0, the portion occupies in its motion an infinity of 
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positions such that the volume of the intersection of any two of these 
positions is 2 p* — e. In particular, if the motion is ‘“‘second order sta- 
tionary,” that is, P4;4;1;, = PAA, then it intersects infinitely often 
its initial position—this is Poincaré’s recurrence theorem (he assumes 
“stationarity”)—and the intersections may be selected to be of volume 
> p* — e—this is Khintchine’s refinement. 

4, Chain dependence. There is a type of dependence, studied by 
Markov and frequently called Markov dependence, which is of con- 
siderable phenomenological interest. It represents the chance (random, 
stochastic) analogue of nonhereditary systems, mechanical, optical, ---, 
whose known properties constitute the bulk of the present knowledge 
of laws of nature. 

A system is subject to laws which govern its evolution. For example, 
a particle in a given field of forces is subject to Newton’s laws of mo- 
tion, and its positions and velocities at times 1, 2, ---, describe the 
“states” (events) that we observe; crudely described, a very small par- 
ticle in a given liquid is subject to Brownian laws of motion, and its 
positions (or positions and velocities) at times ¢ = I, 2, ---, are the 
“states” (events) that we observe. While Newton’s laws of motion are 
deterministic in the sense that, given the present state of the particle, 
the future states are uniquely determined (are sure outcomes), Brownian 
laws of motion are stochastic in the sense that only the probabilities of 
future states are determined. Yet botH systems are “‘nonhereditary”’ in 
the sense that the future (described by the sure outcomes or probabili- 
ties of outcomes, respectively) is determined by the last observed state 
only—the “‘present.”’ It is sometimes said that nonhereditary systems 
obey the “Huygens principle.’ The mathematical concept of non- 
heredity in a stochastic context is that of Markov or chain dependence, 
and appears as a “natural” generalization of that of independence. 


Events 4;, where j runs over an ordered countable set, are said to be 
chained if the probability of every 4; given any finite set of the preced- 
ing ones depends only upon the last given one; in symbols, for every 
finite subset of indices 7; < fo <--+ Jey we have 


Pj Aig: Ai, 643, = Pa, (4%,)- 


Classes C; = {4j1, Ajo, ---} of events are said to be chained if events 
Aj, selected arbitrarily—one in each @;—are chained. 

An elementary chain is a sequence of chained elementary partitions 
2 Any =, nm =1, 2, +++; in particular, if X, = ye Xnel a4, With 
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distinct “n1, “ng, °°: are elementary random variables, then the X, 
are said to be chained, or to form a chain, when the corresponding par- 
titions are chained. 

It will be convenient to use a phenomenological terminology. Events 
of the mth partition will be called states at time n, or at the nth step, of 
the system described by the chain. The totality of all states of the 
system is countable; we shall denote them by the letters 7, k, A, ---, 
and summations over, say, states & will be over the set of all states, 
unless otherwise stated. 

The evolution of the system is described by the probabilities of its 
states given the last known one. The probabilities Pj” of passage 
from a state j at time m to a state k at time m + n (in 7 steps) form a 
matrix P™”, Since “probability given 7” is a probability, and the 
probability given 7 at time ™ to pass to some state in 7 steps is one, 
we have 


mn mn __ 
jk = 0, » jk = 1. 
k 


Furthermore, by the definition of chain dependence, the probability 
given / at time m to pass to state k in m + n’ steps equals the probability 
given j at time m to pass to some state in ” steps and then to pass to 
k in n’ steps, we have 
panty _ > pm Pritam 
h 
or, in matrix notation, 


m,ntn!’ _ pm,n +n,n' 
P = pmnpm 


An elementary chain is said to be constant if Pj” is independent of m 
whatever be j, &, and x. Then we denote this probability by Pj, and 
call it transition probability from j to k in n steps. The corresponding 
matrix P” is called transition matrix in n steps; if n = 1 we drop it. 
The foregoing relations become the basic constant chain relations: 


P20, PR=1, Pat" = PAPh. 
k h 


The last one can also be written as a matrix product P"t” = P™P™, 
Hence P” is the zth power of the transition matrix P = P', so that P 
determines all transition probabilities. In fact, for an elementary chain 
to be constant it suffices that the matrix P™! be independent of m: 
p™1 = P, since then 


pm2 — pmipmti1 _ p2 pm,3 _ pm2pm+2,1 — ps ..., 
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We observe that in Pj, and in every symbol to be introduced below, 
superscripts afe not power indices, unless so stated. 

We investigate the evolution of a system subject to constant chain laws 
described by a transition matrix P. In particular, we want to find its 
asymptotic behavior according to the state from which it starts. In 
phenomenological terms the system is a nonhereditary one subject to 
constant laws (independent of the time) and we ask what happens to 
the system in the long run. The “direct”? method we use—requiring 
no special tools and which has a definite appeal to the intuition—has 
been developed by Kolmogorov (1936) and by Doblin (1936, 1937) 
after Hadamard (1928) introduced it. But the concept of chain and 
the basic pioneering work are due to Markov (1907). 

*5, Types of states and asymptotic behavior. According to the total 
probability rule and the definition of chain dependence, the probability 
QO”, of passage from 7 to k in exactly n steps, that is, without passing 
through & before the ath step, is given by 

On = » PinPhiks mee Phy she 


hixk,hox¥k,-++,hn—~1¥k 


The central relation in our investigation is 
n 
(1) je = Dy OjnP ee” s n=1,2,-+:, 
m=1 


the expressions P?, = 1 (obtained for m = n) are the diagonal elements 
of the unit matrix P®. 

The proof is immediate upon applying the total probability rule. 
The system passes from 7 to & in steps if, and only if, it passes from 7 
to & for the first time in exactly m steps, m = 1, 2, ---, m, and then 
passes from & to & in the remaining 2 — m steps. These “paths” are 
disjoint events, and their probabilities are given by Qi Py, ™. 

Summing over 7 = 1, 2, ---, N, the central relation yields 


N N n N N 
Pie = DD ORPiR” = (Qe 2 Pre”) 
n=l n=1 m=1 m=1 n=m 
and, therefore, 
N N N N-N! N’ 
(1+ >) Pi) 20 OK 2 = (1+ DS Pix) Ol N'< N 
n=l m==1 n=1 n=1 m=1 
N 
It follows. upon dividing by 1 + >> Py, and letting first N — © and 


n=1 
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then NV’ — o, that 


N 
0 Le Pik 
m . n=] 
@) Qik = fim —— 
7 1+ >) Phe 
n=1 
in particular, 
i°.¢) 1 
(3) 1 — 2) 03; = jim —+y—: 
mm 1+ > PR 
n=1 


The sum 


Tjk = » Or 
m=1 


is the probability, starting at 7, of passing through & af least once; for 
k = 7 it is the probability of returning to j at least once. More generally, 
the probability gj, starting at 7, of passing through k at least n times is 
given by 


ioe) 
ge = (0 Ofte’ = qinqte 
m=1 


In particular, the probability gj of returning to 7 at least n times is given 


by 
_ G3 = i = (gia =e = (Qia)”. 
Its limit, 


(4) 7r;; = lim (¢;;)” =O or 1, accordingas g;;<1 or gj; = 1, 
rt © 
is the probability of returning to j infinitely often. It follows that the 
probability, starting at 7, of passing through k infinitely often is 
rik = lim gh = jx lim gts? = 95K kes 
so that 
(5) rj, =O or qj, according as qgxr<1 or gre = 1. 


Upon singling out the states 7 such that g;; = 0 (noreturn) and g;; = 1 
(return with probability 1), we are led to two dichotomies of states: 


j is a return state or a noreturn state according as q;; > 0 or ¢;; = O37 1s 
a recurrent state or a nonrecurrent state according as qg;; = 1 or qj; < 1 
or, on account of (4), according as 7;; = 1 or 7;; = 0. 
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Clearly, noreturn states are boundary cases of nonrecurrent states and 
recurrent states are boundary cases of return states. In terms of tran- 
sition probabilities, we have the following criteria. 


RETURN CRITERION. J State J is a return or a noreturn state according 
as P;; > 0 for at least one n or Pj; = 0 for all n. 


This follows at once from the fact that 
[7.6] 

sup Pix S qik S LL Phe. 
—=1 


RECURRENCE CRITERION. 2 state 7 is a recurrent or a nonrecurrent 
00 


state according as the series >) Pj; is divergent or convergent. 


‘n=1 
This follows from (3). 


Less obvious types of states are described in terms of “‘mean fre- 
quency of returns,” as follows: 

Let vj, be the passage time, from 7 to k, taking values m = 1, 2, ---, 
with probabilities Qf. If gj, = 1, then »;, are elementary random vari- 
ables. If gj, <1, then, to avoid exceptions, we say that »;, = © with 
probability 1 — g;,. The symbol © is subject to the rules 


1 
—=0,0+¢ =o,and© Xc¢c = ~orO according asc > Oorc = 0. 
(o @) 


We define the expected passage time 7;, from 7 to k by 


oe 


Tie = 2 mR + (1 — aye); 


m=1 


we call 7;; the expected return time to J and the mean frequency of returns 
togis—. 

793 . ° s . 

We can now define the following dichotomy of states. A state 7 is 


null or positive according as = QO or - > 0. Clearly, a noreturn 
ii ii 
and, more generally, a nonrecurrent state is null while a positive state 
is recurrent. 
We shall now establish a criterion for this new dichotomy of states 
in terms of transition probabilities. To make it precise, we have to 
introduce the concept of period of a state. 


DEPENDENCE AND CHAINS 33 


Let 7 be a return state; then let d; be the period of the Q%, that is, 
the greatest integer such that a return to 7 can occur with positive 
probability only after multiples of d; steps: Qj = 0 for all 2 ¥ 0 (modulo 
dj), and Q"*i > 0 for some n. Let d; be the period of the P% defined 
similarly. We prove that d; = d; and aqall it the period (of return) of 7. 

The proof is immediate. If Q"7 > 0, then P%7' = Q2% > 0 so that 
d; < d;. Thus, ifd; = 1,thend; = 1. Ifd; >1landr=1,---,d;—1, 
then the central relation yields 

P= 0, Pit = QhPY, = 0, 
Pye re rd arta + OF Pi - 0, etc. ++", 


so that d; S d; and, hence, d; = dj. 
If 7 is a noreturn state, then we say that its period is infinite. 


POSITIVITY CRITERION. 74 state g 1s null or positive according as 
lim sup Pj; = 0 or > 0. 


n> © 


More precisely, if j is a null state, then Pj; — 0, and if g is a positive 
. a 
state, then P2#i + = > 0, while Pj, = 0 for alln #0 (modulo dj). 


33 
Since the proof is involved, we give it in several steps. 
1° If 7 is nonrecurrent, then it is null and, by the recurrence cri- 
terion, the series 2d Pi converges so that Pj; — 0. 
ne 


If 7 is recurrent, then, by definition of its period d;, Pj; = 0 for all 
nm #0 (modulo d;). Therefore, it suffices to prove that, if 7 1s recur- 


a; ee 1 a ee 
rent, then P24? — —; for, if 7 is null, then — = 0 implies + = 0, 
733 753 733 


at es sae d; 
and if 7 is positive, then — > 0. 
j3 
Assume, for the moment, that, if the period d; of the positive recur- 


oe 1 
rent state 7 is 1, then Pj — —. In the general case, take d; for the 


179 
° . t ° t : 
unit step and set P’ = P%i, so that P;? = Pe; hence Q;? = Oni. Then, 
ie) 
: ’ t 734 . 
since 7;; = >> 2Q;" = —, the assertion follows by 
n=1 d; 
1 @ 
d; t Jj 
fg Shay ee 
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Thus, it suffices to prove that, if 7 is recurrent with period d; = 1, then 
1 


Pi. 
ii 
2° Let be recurrent with d; = 1. To simplify the writing, we drop 
the subscripts 7 and, to avoid confusion with matrices, we write super- 
scripts as subscripts. We follow now Erdés, Feller, and Pollard. 


Let a = lim sup P, so that there is a subsequerice nv’ of integers such 
p q g 


that Py» — aasn’ > ©. Since g = >> Om = 1, it follows that, given 
m=1 


e > 0, there exists #, such that, form 2m, >) Qm<e. Therefore, 


for n’ =>n =n, and every p< xn’ with Q,> 0, the central relation 
yields 
Pye = On ny 1 De Onl atom + €. 
msn,m¥p 
Since for »’ sufficiently large, Py» > a — € and Pym <a+te for 
m =n, it follows that 


a-—€ SOP» + (1 — Oat +e 


3 
ate < Pvp Sate 


p 


hence 


Therefore, letting 7’ — © and then e — 0, we obtain Py» —> a, 
and, repeating the argument, we have, for every fixed integer m, 


Prremp > @ as n’ > &, 


3° Let us assume, for the moment, that Q; > 0 so that Pyr_m — a 
for every fixed m. We introduce the expected return time 7 and use 


the fact that 7 is recurrent, so that, setting gz = >> Qm, we have 
m=n+l 


go = 1. The expected return time 7 can be written 


Yee > mOm ao > 1(Gm—1 - In) = > dmy 

m=1 m=1 m =0 

and the central relation can be written 
n n 
P, = >, Oak ya = > (Gm—1 = CE ae 

m=1 m=l1 
so that 
n n—l 
> Gil it = 2s, Gil AS ae i goPo = |. 


m=0 
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Therefore, for x <7’, n 
> QmP n' —m s ] 
m=0 


and, letting 2’ — © and then 7» — ©, we obtaina < 1/7. If r= o, 
then a=0; hence P,—->1/r =0. Thus let r< ~. The same 


argument for 8 = liminf P, shows that, for a subsequence n’’ such 


n-—7 © 


that Py — B as n’’ — w, we have Prv_m — B for every fixed m 
and, from 


n 
D mP nm te Zl for mn Sn<n", 
m=0 


° 1 I e 
it follows as above that 8 2 —-. Therefore, P, — —, and the assertion 
T T 


is proved under the assumption that Q, > 0. 

4° To get rid of the last assumption, we appeal to elementary 
number theory. Consider the set of all those p for which Q, > 0. It 
contains a finite subset {p;} whose greatest common divisor is the 
period d(=1). As above, if Pz» — a, then Py mp, > a for every fixed 
m; and p,;, and it follows that Pr:m — «@ for every fixed linear combi- 
nation m = )/ m:p;. But every multiple of the period md = m 2 [I p; 


+ 4 
can be written in this form, so that, starting with »’ sufficiently large, 
Prim — a for every fixed m, and the assertion follows as above. 
This concludes the proof. 

Since, for a state 7 with period d; there exists a finite number of inte- 
gers p; such that P#? > 0 and, for m sufficiently large md; = >) m;pi;, 
it follows, by P74 2 Il Pi" > 0, that 

% 


If d; is the period of j, then P™ > 0 for all sufficiently large values of m. 


In other words, after some time elapses the system returns to / with 
positive probability after every interval of time dj. 

We can now describe the asymptotic behavior of the system. If k 
is a return state of period dx, set 


0 


qin(1) = > ont, r=1,2,-:-, ak, 


Mm =0 


so that ¢;,(r) is the probability of passage from 7 to k in m = r (modulo 
a,) steps and 


dx 
XL qik(T) = Qik: 


36 DEPENDENCE AND CHAINS 
ASYMPTOTIC PASSAGE THEOREM. Jor every State 


if kis a null state, then Pj, — 0; 


a . ay, 
if k is a positive state, then prgetr — gjr(r) — 3 
Tkk 
and, whatever be the state k, 
1 Qik 
— Pi > Pin = —: 
n m=1 Tkk 


The theorem results from the positivity criterion and the central re- 
lation, as follows: 
If k is a null state, then P%, — 0. Therefore, 


n! n 


m==1 m=n' +1 


and it follows, upon letting 2 — © and then n’ — o, that PA, > 0. 
If kis a positive state, then Peet" = 0 forr < d, and Prt* — dy/tun- 
Therefore, from 


n! n 
O< Paget =—>> Qmaete Pie mae < > idietr 


m=1 m=n' +1 


it follows, upon letting 2 — © and then n’ — 0, that Prrtr —, 
qjk(1) ak /Thk- 

The last assertion follows from the first two assertions. 

*6. Motion of the system. To investigate the motion of the system 
we have to consider the probabilities of passage from one state to an- 
other. But, first, let us introduce a convenient terminology. 

A state j is an everreturn state if, for every state k such that ¢;, > 0, 
we have gx; > 0. Two states 7 and k are equivalent and we write j ~ k 
if gj, > 0 and gz; > 0; they are similar if they have the same period 
and are of the same type. A class of similar states will be qualified ac- 
cording to the common type of its states. 

A class of states is indecomposable if any two of its states are equiva- 
lent, and it is closed if the probability of staying within the class is one. 
For example, the class of all states is closed but not necessarily inde- 
composable. 

The motion of the system is described by the foregoing asymptotic be- 
havior of the probabilities of passage from a given state to another 
given state, and also by the following theorem. 
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DECOMPOSITION THEOREM. The class of all return states splits into 
equivalence classes which are indecomposable classes of similar states. 

A not everreturn equivalence class 1s not closed. An everreturn equiva- 
lence class is closed; if tts period d > 1, then it splits into d cyche sub- 
classes C(1), C(2), -+-, C(d) such that the system passes from a state in 
C(r) to a state in C(r + 1) (C(@+ 1) = C(A)) with probability 1. 


The proof is simple but somewhat long. To begin with, we observe 
that, if 7 and & are two equivalent states, distinct or not, then there 
exist two integers, say m and 7p, such that P7 > 0, Pe; > 0. 

1° The set of all states which are equivalent to some state coincides 
with the set of all return states. For, on the one hand, every return 
state is equivalent to itself and, on the other hand, if 7 ~ k, then g,;; = 
pate > P"P?; > 0. Thus, the relation j ~ k, symmetric by definition, 
is reflexive: j~j. It is also transitive, for 7 ~k implies PH > 0, 
k~ A implies P%, > 0 for some integer ” and, hence, g;, = Pyt" = 
PPE, > 0; similarly for gj. Therefore, the relation 7 ~ & has the 
usual properties of an equivalence relation and the set of all return 
states splits into indecomposable equivalence classes. 

We prove now that, if 7 ~ &, then they are similar. We know al- 
ready that they are both return states; let d; and d; be their respective 
periods. There exists an integer 7 such that P% > 0; hence Pr = 
P%,P?, >O and Pat™t? > PePt.PR. >O; similarly, Pwt?"t? > 0, 
Therefore, d;, being a divisor of m+2-+p and of m+ 2n+), 1s a 
divisor of every such 7 and hence of dy. By interchanging / and &, it 
follows that j and & have the same period. 

If is an everreturn state and P%, > 0, then, from P%t? = P"Ph, > 0, 
it follows that there exists an integer 7 such that Pi; > 0; hence Pit? = 
Pi;Ph%, > 0, and & is an everreturn state. By interchanging 7 and &, it 
follows that they are both either everreturn or not everreturn states. 

If & is recurrent, then, by the recurrence criterion, 


ie @) i°.¢) ie] 
PHS LPH? S PHO Pe) Pi = 
n=1 n=1 n=1 
and j is recurrent. By interchanging 7 and &, it follows that they are 
both either recurrent or nonrecurrent. 
If dis the common period of the two equivalent states 7 and &, then, 


from +nd+ d 
mrn 8) m pn Pp 
P35 = PHP iP kis 


it follows that d is a divisor of m + p and lim P?é > 0 implies lim P?? 


n—-2> © n> © 
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> 0. Hence, upon applying the positivity criterion and then inter- 
changing 7 and &, both are either positive or null. This completes the 
proof of the first assertion. 

2° If zis a return but not an everreturn state, then there exists a 
state h such that gj, > 0 while g,; = 0, so that 4 is not equivalent to j 
and there 1s a positive probability of leaving the equivalence class of /. 

If 7 is an everreturn state, then qj, > 0 entails g,; > 0 so that k 
belongs to the equivalence class of 7. Therefore, the probability of 
passage from 7 to a state which does not belong to the equivalence class 
of 7 is zero and, the class of all states being countable, the probability 
of leaving this class is zero. 

Finally, we split an everreturn equivalence class C of period d > 1 
as follows: Let j and k belong to C. Since P#t? => P"PP. > 0, dis a 
divisor of m + p and, 1f m, and mg are two values of m, then m, = mo 
(modulo d). Thus, fixing 7, to every & belonging to C there corresponds 
a unique integer 7 = | or 2, ---, or d such that, if Pj > 0, then m =r 
(modulo d). The states belonging to C with the same value of r form 
a subclass C(r) and C splits into subclasses C(1), C(2), --- C(d). It 
follows that, if k and k’ belong respectively to C(r) and C(r’), then 
Pi can be positive only for 7 = | r—r | (modulo d). Moreover, ac- 
cording to the proposition which follows the positivity criterion, Pay 
> 0 for all such x sufficiently large. Thus no subclass C(r) is empty and 
the system moves cyclically from C(r) to C(r+ 1) --+ with C(d+ 1) 
= C(1). This proves the second assertion. 


Corotiary 1. The states of an everreturn equivalence class C are linked 
in a constant chain whose transition matrix 1s obtained from the initial 
transition matrix P by deleting all those Pj, for which j or k or both do 
not belong to C. 


Coro.tuary 2, The states of a cyclic subclass C(r) of an everreturn 
equivalence class with period d are linked in a constant chain whose tran- 
sition matrix P' is obtained from P*% by deleting all those P%, for which j 
or k or both do not belong to C(r). 


Coro.iary 3. An everreturn null equivalence class C is either empty 
or infinite. In particular, a finite chain has no everreturn null states. 


Let C be finite nonempty. By the asymptotic passage theorem, 
Pi, + 0 for kEC. But C is closed, so that 1 = > Py > O for 
kEC 


J € C, and we reach a contradiction. 
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Corouiary 4. If 7 and k are nonequivalent everreturn states, then 
le = Q. 
If j and k are equivalent positive states, with period d, then 


Pret? _, d/ry, for some r= r(f,k) 
prdtr = 0 for r! #r (modulo a). 


This follows by the asymptotic passage theorem. 

*7. Stationary chains. The evolution of a system is determined by 
the laws which govern the system. In the case of constant elementary 
chains these laws are represented by the transition matrix P with ele- 
ments Pj. While P determines probabilities of passage from one state 
to another, it does not determine the probability that at a given time 
the system de in a given state. To obtain such probabilities we have 
to know the initial conditions. In the deterministic case this is the 
state at time 0. In our case it ts the probability distribution at time 0, 
that is, the set of probabilities P; for the system to be in the state / at 
time 0. Then, according to the total probability rule, the probability 

; that the system be in the state & at time ” = 1, 2, -+-, is 


k= 2D) P5Pite 
j 


The notion of statistical equilibrium corresponds to the concept of 
stationarity in time. In our case of a constant elementary chain with 
transition matrix P, it is stationary if P| = P; for every state k and 
every 72 = 1,2, ---. 

Given the laws of evolution represented by a transition matrix, the 
problem arises whether or not there exist initial conditions represented 
by the initial probability distribution such that the chain is stationary; 
in other words, whether or not there exists a probability distribution 
{P;} which remains invariant under transitions. In general, one ex- 
pects that if, under given laws of evolution, an equilibrium is possible, 
then it is attained in the long run. To this somewhat vague idea corre- 
sponds the following 


INVARIANCE THEOREM. For states j belonging to a cyclic subclass of a 


ws ; ; ; dad. 
positive equivalence class with period d, the set of values P; = — is an 
773i 
invariant and the only invariant distribution under the transition matrix 
of the subclass. 


According to Corollary 2 of the decomposition theorem, it suffices to 
consider the chain formed by the subclass, that is, by one cyclic posi- 
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tive class with some transition matrix P of period one. According to 
the asymptotic passage theorem, 


1 
Phx _- Py > 0. 
Tkk 
Since 


Ph =1 and PR” = DY PEP ry 
ke h 


it follows, upon taking arbitrary but finite sets of states and letting 
n — , that 


~ Pi 1, P, = > P,P%.. 
k h 


But if, for some k, the second inequality is strict, then summing over 


all states k, we obtain 
12> UP. > UP, 
k h 


so that, ab contrario, 


Py = X PrP. 
| h 


Since >> P, is finite, we can pass to the limit under the summation sign, 
; 


so that, by letting 7 —> , we obtain 


Pr=(LPi)Pi 
h 


and, P; being positive, it follows that > P, = 1. Thus, the set of 
h 
values P;, is a probability distribution invariant under P. 


It remains for us to prove that, if a set of values P;, has the same 
properties, then P, = P,. But from 


Pr = 2d) PaPhr 
h 


it follows, as before, that P, = (>> P,)P, = Py, and the conclusion is 
h 
reached. 
Corotiary. If Cis a positive equivalence class, then 


1 
— =] 
; SEC 77; 
This follows from 
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STATIONARITY THEOREM. 4 constant elementary chain with transition 
matrix P is stationary with initial probability distribution {P,\ if, and 


only if, P, = 0 for all null states k, and P, = Pe for all states belonging 
Tkk 


to positive equivalence classes C,, with >) py = 1. 
t 


Let the probability distribution {P,} be invariant under the transi- 
tion matrix P so that 


Pr= DP; ee 


If k is a null state, then, by the asymptotic passage theorem, Pj, — 0. 
>, P; being finite, we can pass to the limit under the summation sign. 


Jj 
It follows, upon letting » — ©, that P, =0. Hence, by summing 
over positive states only, >>’ P, = 1. 

If k belongs to a positive equivalence class C;, then, by the asymptotic 
passage theorem, we have that Pj, = 0 for every 7 which does not 


1 2 1 
belong to C; and a” > Py — — for every 7 belonging to C;. It fol- 
m=] 


Tkk 
lows that ; 
n opm p 
Py=  P)Pe= Do P,(- >> m) > 
,ECr jEC; 1 m=1 Thk 
where 
n= XP; and Sp= dD’ R= 1. 
IEC; t 


This proves the ‘only if” assertion. 
Conversely, let the conditions on the P; hold and use 


Pe = DO! Ps Pik 
J 


where the summation is over positive states 7 only, since P; = 0 for j 
null. 

Therefore, if & is null, then PA = 0 and Py = 0 for every m. If k 
belongs to a positive equivalence class C;, then, since C; is closed, P}, 
= 0 for all states 7 which do not belong to C;, and, C’, being a finite 
subclass of C; such that >> P; < e with sum over 7 © C, — C’,, we have 


b= D> PPh S de D PHi/tH + € 
IEC: jEC. 


Upon replacing +. by the limit of the mean in the asymptotic pas- 
JJ 
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sage theorem with subscripts 4, 7 © C’;, we obtain, by summing first 
over the /, 


Hence 
Pe 
Pr s ~~ + €, 
Tkk 
. m - Lt 
so that, letting e — 0, we have Py = —- 


Tkk 
If, for some k, the inequality is a strict one, then, since for null states 
Py = 0, it follows, by summing over positive states k only, that 


, 1 
L= 2 P< ind — = p= 1. 


kECt Tkk 


Dt cee to 
Therefore, Py = — for every m, and the if” assertion is proved. 
Tkk 


COMPLEMENTS AND DETAILS 


I. Physical statistics. The problem is to determine the state of equilibrium 
of a physical system, of energy E, composed of a very large number N 
of ‘“‘particles” of the same nature: electrons, protons, photons, mesons, neu- 
trons, etc. 

Hypotheses. There are g, microscopic states of energy ¢1, go of energy ¢2, --- 
and each particle is in one of these states. The macroscopic state, 1.e., the 
state of the system, is specified by the number of particles at each energy level: 
v, particles of energy ¢1, v2 particles of energy e2, ---. The set {r, v2, ---} isa 
set of random integers and the probability of a macroscopic state vy = m1, 
vo = No, «++ is equal, up to a constant factor, to the number Y of ways in which 
ny: particles can be distributed amongst g; microscopic states of energy ex, k = 1, 
2, °°+, provided 


ym =N, >> mer = E. 
k k 


The Maxwell-Boltzmann statistics (classical theory of gases) is that of distin- 
guishable particles without exclusion, i.e., without any bound upon the pos- 
sible number of particles in any of the microscopic states. The Bose-Einstein 
statistics (photons, mesons, deuterons, ---—particles with an integer “spin’’) 
is that of nondistinguishable particles without exclusion. The Fermi-Dirac 
statistics (electrons, protons, neutrons—particles with a semi-integer “‘spin’’) 
is that of nondistinguishable particles which obey the Pauli exclusion principle, 
that is, there cannot be more than one particle in any of the microscopic 
states. 
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Weights. Let w denote the weight of the macroscopic state {m1, 72, ---} i.e., 
W/N'\ in the distinguishable case and W in the nondistinguishable case. Prove 
that the combinatorial formulae give the following expressions for w, where it 
is assumed that » ny. = N, > nye, = E (in the case of photons N is not fixed 


and only the second condition remains): 


Distinguishable Nondistinguishable 
Particles Particles 


Without exclusion | w = [] ¢."/J] a! |w =T] (gi ni ae 
AzyAZi— 1): 


(Maxwell-Boltzmann) (Bose-Einstein) 


.! | 
With exclusion w = TT] —**—__ | w= 83: 
II n\(g; — 1)! II ni\(g; — ;)! 
(corresponds to no (Fermi-Dirac) 


physical reality) 


When g; >> m, then the expressions of the weights in B.-E. and F.-D. statistics 
are equivalent to w in M.-B. statistics. Assume distinguishability and let ¢ be 
the “capacity” coefficient of the microscopic states, that is, if there are already 
n particles in the g;, states of energy e.(k = 1, 2, ---), the number of these gy 
states which remains available for the (7 + 1)th particle is g, — mc—this is 
Brillouin statistics. The weights w of the macroscopic states, previously defined 
as w = W/N}, are given by 


w= Tl — eulge —c) +++ [gn — (mm — 1c] 
k Mk. 


and reduce to those of M.-B., B.-E., and F.-D. by giving to the parameter c 
the values 0, —1, +1 respectively. 

Statistical equilibrium. Fora very large N the equilibrium state of the macro- 
scopic system is postulated to be the most probable one, that is, the one with 
the highest weight. Assume that Stirling’s formula can be used for the fac- 
torials which figure in the table of weights above. Take the variation 6 log w 
which corresponds to the variation {671, 5%, ---}. Using the Lagrange multi- 
pliers method, the state which corresponds to the maximum of w is determined 
by solving the system (prove) 


dlogw+rA-6N+y-6E =0 
> m = N, >, mex = E. 
k k 


(In the case of photons take X = O and suppress the second relation.) The 
equilibrium states for the various statistics are also obtained by replacing ¢ by 
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0, —1, and 1 in the equilibrium state for Brillouin statistics, given by 


te = gf (eX THe + 6) 


where A and yp are determined by the subsidiary conditions 
N = Do ge/(tH + 6), E = Do gnen/(ete + oc). 
i P 


The Planck-Bose-Allard method. The macroscopic states can be described in 
a more precise manner. Instead of asking for the number 7; of particles in the 
states of energy e., we ask for the number gem of states of energy ex occupied by 
m particles. The particles are assumed to be nondistinguishable as required by 
modern physics. The combinatorial formulae give 


w =TT:!/ []gkm!) with gz = >) fim, N= Ld Mim, = dd, €kM&km: 


To obtain the statistical equilibrium state use the procedure described above. 

B.-E. statistics is obtained if no bounds are imposed upon the values of m. 
F.-D. statjstics is obtained if m can take only the values 0 or 1; “intermediate”’ 
statistics is obtained if m can take only the values belonging to a fixed set of in- 
tegers. 

In the equilibrium state (with c = —1 or +1 when the statistics are B.-E.’s 
or F.-D.’s respectively), we have 

Sem = eel + cay) ~ cay”, where az = e7 AtHEH) 

and g.(u), determined by the usual subsidiary conditions, the generating function 
of the number of particles in a microscopic state of energy eg, is 


g(u) = (1 + cap) 9-1 + capes)”. 


II. The method of indicators. 

1. Rule: In order to compute PB, B = f(41, 41°, ---, Am, Am°), take the 
following steps: 

(a) Reduce the operations on events to complementations, intersections, and 
sums; 

(b) Replace each event by its indicator, expand, and take the expectation. 

In this way find 


PC Ua) and P( A A; U A;°) in terms of PC n Ai S. 


j=1 1 ek -4-1 


Notations. Let I4, = I; and let R = Li be the “repetition” of 4;’s, that 


7= 
is, the number of events 4; which occur. Let Jo = 1, Jn = > I,:--Jj, where 
the summation is over all combinations lspi<jpo < “<jr Sm. Let Ly 
and J) be indicators of the events exactly r /’s occur and at least r 4’s occur, 
respectively; set 


S,=Ef,, Pry = Elin, Pwr = El). 
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2. Prove 
(a) ym u'lip) = i (u— 1)*J, = 
and deduce 
(b) Pry = »» (—1)*¥ °C, "Sz, 
= ¢ 
(c) Sh = dy CrP ig = 2 i=tPw, 
(d) R(R—1)-+-(R—k+1) = bY. 


3. Letk S rm. Using 2(c) and the relations 


Im SS Io) STo-1 S:::' S Le), 
prove that 
(Sp — CF_1)/(Ck — Ch) S Poy S Se/Ch. 


Examine the special case r = m; the left-hand side becomes Gumbel’s inequality; 
the right-hand side becomes Fréchet’s inequality. 


Let 
Jk) =1—Si/Cn, Aff) =f(k + 1) — f(A). 
4, Prove 
m—1 Cok ' 
(a) AJ(k) = » Ct, lta, 
(=k 
Cn 
(b) Iin SE AIO, RSS m1, 
Cn—k-1 
(c) Aj(k) 2 0; 
deduce a scale of inequalities for the S;’s. 
5, 
J(k) “ So Gms k-1 _ \ 
(a) mA (>) = 0 a Lo), 
Ch, = k 
(b) LI <2 (-ma“®), 
Cn pet 
(© at® > 0; 


deduce another scale of inequalities for S;’s. 

6. The general symbolic melhor The events By, ---, Bm are called exchange- 
able if P(B:,..-+ Bi, Biy° +++ B%,,,) depends only on 1 the number r of events 
B; and on the number s of events B;°. 


Let 
Tite = CD MAs) «+ WAM As, 9) + MAG, 2 
Sr/s = E(J ris) — > P(Ai, * A; Ai ; Ai, 4) 
Pris = P(B:, coe B; Bi, 1° oo. B. 


ty ae) 


thai oe 
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If we choose the B;’s such that 
» P(An +++ Ai Aigo °° Aig) = » P(Bi +++ By Bis® ++ Buse), 


then 
‘ Pr/s = S+j3/CmCm—r- 


If we further introduce symbolic independent events having the same prob- 
ability p, the complementary events having probability g = 1 — p, then, sym- 
bolically, 


Pris = p’'q’. 


The symbolic method consists of the following steps: 

(a) In any given identity (or identical inequality) for p, q(OSpZ1 
q = 1-— p) replace p’g* by Pris. 

(b) Replace p,/. by S+/s/CmCm_, and obtain an equality (or inequality resp.) 
for the S;/,’s. 


Examples: 
(a) Starting from p’¢* = p’(1 — p)® obtain 
Sr/s p i C3 
CaFCR, = Rg!) Gar Sessi0s 


in the special caser + 5 = m, find 
Sr/m—r = Pin. 
(b) Starting from p’¢° = p’g*(p + g)”~"—’, obtain 


m—s 
Sris = Dy CiCn—iSijm—ie 
i=? 


In the special case s = 0, find 


S+/0 = S; =e. 
(c) Starting from pg" = pg’, r’ S r,s’ Ss, find 
Swist > _Srie rsrn s' Ss, 


Cnt Cras CnCr+s’ 
and as a special case the scale of inequalities (4c). 


. 
(d) Starting from 1 = >>’ C?p"~*g‘ where >)’ denotes a sum in which a certain 
i=0 
number of terms is omitted, find 


Oe = > Sr—1fi 


and, taking only the terms 7 = 0 and i = 1, find the second scale of inequalities 
(5c). 

7. The classical problem of matching. This problem (probléme des rencontres) 
was studied first by Montmort (1708) and further treated by Lambert, Euler, 
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and others in different forms, all of which can be described by the following 
setup: given m distinct numbers Xi, X2, ---, Xm, choose at random a first X;,, 
then a second X;, from the remaining ones, etc. A match (coincidence, ren- 
contre) is an event 4; which consists in choosing exactly X; at the ith draw. 

In the following, assume that each permutation (X;,--+X;,,) has the same 
probability of being chosen at random. Show that 


(a) P(dAn, «++, dy) = Ca! 


1 
(b) Pin = rl a 


(c) Find lim P,); interpretation? Show that Pim—1 = 0; interpretation? 
(d) Show that E(r) = 2, rPin = Sy = 1 and Elr — EW) = 1. (Use the 
T= 


. ™ 
generating function >) u”Piy4.) 
r=0 


Ill. Random walk. A particle starting at some point of an m-dimensional 
space moves in such a way that its consecutive displacements can be repre- 
sented by independent m-dimensional random vectors. Problems of the fol- 
lowing type arise: find the probability that in time T or before time T the par- 
ticle reaches a certain domain D, or that it reaches D without having reached 
previously a domain D’, or find the expected time for the particle to reach D, 
etc.:-- 

We give a few examples which show the great variety of forms under which 
this problem occurs, questions which can be asked, and methods of solution. 
We restrict ourselves to the discontinuous case with every move taking one 
unit of time. 

1. Game of “heads or tails’ and combinatorial method. To n tosses of a coin 
with equal probabilities for heads and for tails we associate the score point whose 
coordinates are respectively the number of heads and the number of tails which 
occur. Thus, at every toss, the score point M moves by one unit either upwards 
or to the right, and the game is represented by a two-dimensional one-sided 
random walk on the lattice of points with integer coordinates. 

The score points corresponding to the same number z of tosses lie on the line 
(a + d)! 

als! 

(a) If 4 and B ran for office, 4 got a votes and B got 6 < a votes, find the 
pr. P that in counting the votes 4 be always ahead of B. 

(Equivalent to the pr. that the score point stays below the bisectrix until it 
reaches the point M = (a, 5). Compute the pr. of the complementary event by 
applying the symmetry principle of Désiré André as follows: the paths from 0 
to M which intersect the bisectrix either go through (1, 0) or through (0, 1). 
By reason of symmetry both classes contain the same number of paths. The 
number of those which go from (0, 1) to M is (a + 4 — 1)!/a\(4 — 1)!, and 

a—b 


=a 


x+y =n. The total number of paths between 0 and M = (a, 4) is 
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(b) The probability that there be neither gain nor loss in exactly 27 tosses is 
(2n — 3) | 
74.6... Op onaJ an (Start with the number of paths from 0 to 


(n,n — 1) which do not intersect the bisectrix.) 

(c) The probability that the gambler who bets on heads and whose fortune is 
m times the stake loses his fortune in m+ 2n tosses is m(m+n+1)-:-: 
(m + 2n — 1)/2™+2ny!, (Reduce to (a) by taking for origin the point (m + 2, 7).) 

2. Gambler’s ruin. 

(a) Method of difference equations. Consider a one-dimensional random walk 
on the lattice x = 0, +1, +2,---. Ateach step the particle at y has probability 
Pe to move from y toy +k, k= 0, #1, 42,---. Let Pz be the probability 
of ruin, that is, starting at x with O < x < a to arrive at y S 0 before reaching 
y 2a. Then Pz, = >> P,pz_, with boundary conditions P, = 1 if y S 0 and 

y 


Py = Oif y 2 a. 

The gambler has x dollars and wins or loses one dollar with respective proba- 
bilities p and g = 1 — p. Find the probability P, of his ruin. Find the proba- 
bility Psn of his ruin at the mth game. 

In the first case, P2 = pPrir + gPsz1 with P,= 1, Pa=0. The solution 

sp, = PY = @/ 2) 

(q/p)? — 1 

In the second case Pans1 = PPesian + ¢Pz-14n with Pon = Pan = 0 and 

Poo = 1, Pzo = 0. The solution is 


for p ~qand Py = 1—" for p = 4. 


Pen = ~l9np(n—2) !2g(n+2)12 5 cos” n—1 Tk eine sin 
k=1 a Qa a 


(b) Method of matrices. Same random walk but with p; = p_1 = 1/2. The 
particle starts from 0 and dies when it attainsa@ -1S0ord=a+c2 —-1. 
Find the probability P, that after 7 displacements the particle is still alive, as 
follows. 


Set g(k) = 1/2 for k = +1 and g(k) = 0 otherwise. Then P, = >. g(k) -:: 
g(k,) where the sum is taken over all k’s such that a S >) &; S 4, A = 1, 2, 
j=l 


]= 
-s+n. Set dj = ki +---+k; ~— a. Then P, is the sum of the elements of 
the (1 — a)-th column or row of the matrix 4” where 


0 40 0 
1 1 
pes _|2z 0 3 O 
=(ei-m=]2 9 8G 


- ee @e@ j@® oe 


The proper values A; of 4 are given by A; = cos —~—~ 
are \;", and 


= 5? the proper values of 4 


2 | aft — ; 
P, =——~ >) cos" md sin AL oop TH 


cle c+2 c+2 c+2’ 


where >,’ denotes summation over the odd j’s only. 
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IV. Geometric probabilities. 

Elementary probabilities. Consider an n-dimensional space of points 
U = (uy, «++, un) and let G be a group of transformations of points into points. 
If there exists a differential element du = g(t, +--+, un)duy +++ du, determined 
up to a constant factor by the property that its integral over domains is in- 
variant with respect to the group G, du defines up to a constant factor an ele- 
mentary probability. The constant factor is determined by fixing a domain Do 
within which all considered domains lie and by assigning to this domain the 


pr. one; that is, by setting c f du = 1. Then the points are said to be taken 
D 


0 
or thrown at random in Do. To say that several points are taken or thrown at 
random means that the throws are stochastically independent; in other words, 
we make repeated trials. 

Let M with or without affixes be points in an m-dimensional euclidean space 
and let «1, --+, %m with same affixes, if any, be its cartesian coordinates with 
respect to a fixed orthogonal frame of reference. The group G which transforms 
points M into points M is the group of euclidean displacements (preserves euclid- 
ean lengths). This means that the probability is required to be independent 
of the choice of the frame of reference. Prove that du = ¢ dx, dxq +++ dxm. 

Let us now investigate straight lines in a euclidean plane determined by their 
equations “1x, + uex. = 1 in rectangular coordinates, and let G, be the group 
of euclidean displacements in the plane. Prove that du = c(u1? + u9”)~ du; due 
or, using the normal equations: x; cos 8 + x2 sin@ — p = 0, du = cdp dd. 

(The transformations of the group G, are of the form x’; = a1 + x1 cosa@ 
—xesina, x9 = de+x1,sina-+.%2:cosa@ and induce transformations of a 
group G on the plane (1, ue) defined by 


uy = (uy cosa + u’2 sin a)/(aiu'; + aou'e + 1), 
ue = (—u’; sina + ue cos a)/(a\u'1 + aou’s + 1). 
The invariance condition yields 


Duy Uo) . D(u1 Uo) (uy? + Uy”) 4 
, y= , ue a th ——— = -—————_—__ 
glu 1, 4% 2) g(uy Uu ) D(u's, u'>) wi D(u's, u'>) (u’2 + u' 9?) 4 
With the same group G, there is no elementary probability for circles in the 
plane. But there is one for circles of fixed radius.) 
Points on a line. The elementary probability for a point M@ on a segment 
[0, 7] is dx/?. Throw n points at random on the segment. The probability, 


say, that there be no thrown points on [0, x] is { 1 -5 . What is the ex- 


pected distance of the nearest to 0 of the thrown points? What is the proba- 
bility that & out of the 7 thrown points lie on a fixed subinterval of length a? 
Find what happens as / > © with n/] > X} > 0. Denote then by Mj, Mp, --- 
the points in the nondecreasing order of their distance to 0. What is the ele- 
mentary probability for the length 4;M;4, to be between « and x + dx and 
what is the expectation of this length? 

Lines in a plane. The elementary probability of a straight line x cos @ + 


y sin @ — p = 0 thrown on a plane is du = cdp dO. The integral i) dp d0 over a 
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domain induced by a family of straight lines is said to be the measure of the 
family. The measure of the secants of a segment of length / is 2/. (6 varies 


from — 5 to + 5 while p varies from 0 to / cos @ for every fixed 6.) The measure 


of the secants of a polygonal line of length / is 2/, provided every secant is counted 
as many times as there are points of intersections of the secant with the polyg- 
onal line; in particular, the measure of the secants of a closed convex polygon 
is its perimeter. The same is true for the secants of a curve formed by a finite 
number of analytic arcs. Prove it directly for the secants of a circle. 

Let C and Cy be two closed convex curves of respective lengths / and /y with 
C being interior to Cy. The probability that a secant of Cy be secant of C is 
I/Ip. 

Application to the needle problem. If Cy is a circumference of radius r/2 and C 
is a segment of length /, then p = 2//axr. Throw the figure formed by the cir- 
cumference and the segment on a plane with parallel equidistant straight lines 
with common distance r. The probability that one of these lines intersects the 
segment is 2//ar. Prove it directly by throwing a needle of length / on this 
plane. 

(The position of the needle 4B is determined by the coordinates x, y of 4 
and the angle a that 4B makes with Ox, one of the equidistant lines. The 
elementary probability is dx dy da. It is not a restriction to assume @ between 


w/2 
O and 7/2, x = 0, and y between 0 andr. Then p = 2 f sin a da.) 
0 


A differential method. Let Do be a domain of the plane on which are thrown 
at random x points. Intrinsic properties of the figure formed by the points are 
defined independently of Do; for example, MM. < /, triangle M,Mo2M3 has 
acute angles, ---. 

The probability of an intrinsic property is given by P = a/s" where s is the 
area of Do and a represents the measure of the set of favorable cases. Let D'p 
be a new domain containing Dp and let P + AP = (a + Aa)/(s + As)" be the 
new probability of the same property. If P; is the probability of the property 
when 2 — k points are in Do and & points are in D’y — Do, then 


n\ 


= zl — Bl Pys™—*(As)* 


at+Aaq=atat---tan, a 


and 


(s + As)" AP = n(Py — P)s1 As +e 


! 
+5 “—— (P, — P)s"-#(As)* +++ (Pa — P)(As). 


(n — k)! 
Keeping infinitesimals of first order, we have 
iP = n(P1 — P) = 


where 7 is the number of points thrown at random on Dp, P is the probability 
of the property, P; is the probability of the same property when 1 point is 
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thrown at random on an increment of Dp of area 6s, and 2 — 1 points are thrown 
at random on Do. More generally, 


6m = n(m — m) 


where m is the expectation of a function of the thrown points, and the other 
quantities are defined similarly to what precedes. The method and the for- 
mulae apply whatever be the number of dimensions of the space. 
Application. Two points M, and M2 are thrown at random on a segment 
: . 2 2 
of length 7. The probability that MMe < x is + _ x . What happens when 
the segment is replaced by a circle of radius r? Find EMM? in both cases. 
V. Bernoulli case and Weierstrass theorem. Consider the Bernoulli case 
(with PA = x in lieu of p):0S x» $1, 


n\ 
= = a ed k — n—k = oe 8 
(a) >, Poke) = 1, ESa = DY kpar(x) = 
k=0 k=0 


es, = 3 (k — nx)? par(x) = nx(1 — x). 


(b) Let f be a real or complex-valued continuous function on [0, 1]. It is 
bounded: | f| S c <o and uniformly continuous: Given € > 0 there is a 6 > 0 
such that | x — x’| <6 = | f(x) — f(x’) | <e. Form Bernstein polynomials 


E(f(S,/n)) = Ye Slk/n)Ponls), 
that is, 


P(x) = YMk/a) El Cy xe(1 — x), 


(c) Weierstrass theorem says that on [0, 1] there are polynomials which con- 
verge uniformly to //. 
Bernstein polynomials are such that 


IE(f(x) —f(Sn/n))| = Lf) — Palx)| 
=1 @) —fe/M)pe® SIO L+l 


k=0 |k—-nz| Sn |k—nz| >néd 


The first partial sum is bounded by € )* pas(x) = €. The second partial sum is 
k=0 
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bounded by 


2 n 
2¢ >> Pnx(x) <5 (k — nx)*par(x) = 2cx(1 — «)/ns? S c/2n8?; 
k=0 


|A~n2x| >n8 


note that the first inequality while algebraically immediate is due to Tchebi- 
chev’s inequality: 

E(| 8,/n — E(S,/n) | > 6) S oS,,/n?62. 
Thus, for all x € [0, 1], as 7 —> © thene— 0, 

| f(x) — Pix) | S e+ ¢/2n8? - 0. 

Leaving out all references to the Bernoulli case, the most elementary proof 
known of Weierstrass theorem obtains: It introduces explicit uniformly ap- 
proximating polynomials and is primarily algebraic. 


Part One 


NOTIONS OF MEASURE THEORY 


No rigorous presentation of probability theory is possible without 
using the notions of sets, measures, measurable functions, and inte- 
grals. Their first lineaments are already apparent in elementary prob- 
ability theory. These notions are introduced and investigated syste- 
matically in this part. 

The presentation is self-contained, and the material will suffice for 
later parts. It is organized—at the cost of a few repetitions—so as to 
make the unstarred portions independent of the starred ones and, at 
the same time, to make the sections on measurable functions, conver- 
gence, and integration independent of the remainder except for 1.1 to 1.5. 
This permits a reorganization of the course so as to proceed from the less 
abstract notions toward more abstract and more involved ones. The 
following order is possible: 1.1 to 1.5 with 5.1 to 7.2, then 3.1, 3.2 with 
8.1, suffice for practically all of the unstarred portions of Parts II, III, 
then IV. 


Chapter I 


SETS, SPACES, AND MEASURES 


§ 1. SETS, CLASSES, AND FUNCTIONS 


1.1 Definitions and notations. A se¢ is a collection of arbitrary ele- 
ments. By an abuse of language, an empty set is a “‘set with no ele- 
ments.” 

Unless otherwise stated, all sets will be sets of elements of a fixed 
non empty set Q, to be called a space. Elements of Q will be called 
points and denoted by w, with or without affixes (such as subscripts, 
superscripts, primes, etc.). Capitals 74, B, C, +--+, with or without 
affixes, will denote sets of points, {w} will denote a set consisting of 
the one point w, and @ will denote the empty set, that is, the set “con- 
taining no points.” If w is a point of 4, we write w C 4 and, if w is 
not a point of 4 we writew € 2. 

A set of sets is called a class and classes will be denoted by @, 8, @, 
+++, with or without affixes. The class of all the sets in Q is called the 
space of sets in Q and will be denoted by S(Q). Thus a class of sets in 
Q is a set in S(Q) and all set notions and operations apply to classes 
considered as sets in the corresponding space of sets. 

A is said to be a subset of B, or included in B, or contained in B, if all 
points of 4 are points of B; we then write 4 C B or, equivalently, 
B2A. In symbols, if w € 4 implies w € B, then 4 C B, and con- 
versely. Clearly, for every set 7, 


GCACY, 
and the relation of inclusion is reflexive and transitive: 
ACA; ACB and BCC imply ACC. 


Aand B are said to be egualif 4 C Band B C A; we then write 4 = B. 
55 
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Clearly, the relation of equality is reflexive, transitive, and symmetric: 
A=A; A=B and B=C imply A=C; 
A=B8 implies B= Z. 


1.2 Differences, unions, and intersections. The difference 4 — B is 
the set of all points of 4 which do not belong to B; in symbols, ifw € 4 
and w € B, then w € 4 — B, and conversely. The particular differ- 
ence 2 — J, that is, the set of all points which do not belong to J, is 
called the complement of A and is denoted by 4. 

The intersection A 1) B, or simply 4B, is the set of all points common 
to 4 and B; in symbols, if w € 4 and w € B, then w € AB and con- 
versely. The union 4 U B is the set of all points which belong to at 
least one of the sets 4 or B; in symbols, if w € 4 or w € B, then w € 
A U B and conversely. If 4B = @, then 4 and B are said to be dis- 
joint, and their union is then denoted by 4 + B and called a sum. 

It follows from the definitions that the operations of intersection and 
union are associative, commutative, and distributive: 


(AUB)UC=AU(BUO, (4B)C = ABO); 
AUB=BU4Z, AB=BA: 
(4U B)C= ACU BC, (AU B\(AUQ=AU BC. 


Moreover, the operation of complementation has the following prop- 
erties: 
ACB implies 4° D B*; 


N=6, 6 =9 AM =G9, 44+ 4=9, (AY=4; 
A — B= AB, (4U By = AB, (AB) = AU Be. 


The notions of intersection and union extend at once to arbitrary 
classes. Let T be a set, not necessarily in 2, and to every ¢ € T as- 
sign a set 4, Q. The class {4;, ¢ © T} of all these sets, or simply 
{A,} if there is no confusion possible, is a class assigned to the index 
set T. 

The intersection, or infimum, of all sets of {4;} is defined to be the 
set of all those points which belong to every 4, and is denoted by 

4 or by inf 40 we drop ¢ € T if there is no confusion possible. 
t t 


In symbols, if w © 4; for every ¢ © T, then w € (} 4 and conversely. 
The union, or supremum, of all sets of the class {4;,} is defined to be 
the set of all those points which belong to at least one 4;, and is denoted 
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by U 4%, or by sup 4; we drop ¢ € T if there is no confusion possible. 
ret LCT 
In symbols, if w€ @ for at least one ¢€ T, then w € LU 4; and 


conversely. 

If all sets of {4,} are pairwise disjoint, {4,} is said to be a disjoint 
class and the union of its sets, denoted then by >> 4, is called a sum. 
Conversely, the term “sum” and the symbols >> and + when used 
for sets of a class will imply that the class is disjoint. 

If w does not belong to at least one 4, then it belongs to every 4;,°, 
and conversely; consequently (de Morgan rule), 


(U A,)° = N A;’, (1) A,)° = U Af’. 


When {4;} is empty, that is, J 1s empty, it is natural to make the con- 


vention that U 4; = 9. Then, in order to preserve the foregoing rela- 
tego 
tions, we have to make the convention that (] 4; = Q. Thus, by con- 
tE8B 


vention, 
U A: = QB, () Ai = Q. 
te g tEg 
It is easily seen, collecting all the relations so far obtained, that the 
following duality rule holds: 


Every valid relation between sets, obtained by taking complements, unions, 
and intersections, 1s transformed into a valid relation if, the symbols 
“=” and “° remaining unchanged, the symbols (\, C, and ®, are in- 
terchanged with the symbols (J, D, and Q, respectively. 


Operations performed on elements of “‘countable” classes will play 
a prominent role later in connection with the notion of measure. A 
set, or a class, is said to be finite, or denumerable, according as its ele- 
ments can be put in a one-to-one correspondence with the set {1, 2, 
-++, m} of the first 7 positive integers, for some value of 7, or with the 
set of all positive integers {1, 2, --- ad infinitum}. It is said to be 
countable if it is either finite or denumerable. Similarly, operations 
performed on elements of finite, denumerable, or countable classes will 
be said to be finite, denumerable, or countable operations, respectively. 

The following immediate transformation of countable unions into 
countable sums will prove useful in connection with the notion of 
measure: 


U 45 = 4, 4+ Aye + Ay°Ae°A3 ++°>. 


1.3 Sequences and limits. To every value of 7 = 1, 2, ---, assign 
a set 4,; these sets 4,, whether distinct or not, are distinguished by 
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their indices. The ordered denumerable class 4,, 42, ---, is called 
sequence Ay. The set of all those points which belong to almost all 
An (all but any finite number) is called the inferior limit of An, and is 
denoted by lim inf 4,. Clearly 


lim inf 4, = U f) 4x. 
n=1 k=n 
The set of all those points which belong to infinitely many 4, Is called 
the superior limit of 4, and is denoted by limsup 4,. Since every 
point which belongs to almost all 4,° belongs to a finite number of 4, 
only, and conversely, it follows, by duality, that 


lim sup 4, = (U () 4°) = 1) U &r. 
n=1 k=n n=l k=n 
Every point which belongs to almost all 4, belongs to infinitely many 
An, so that 
lim inf 4, C lim sup 4p. 


Thus, if the reverse inclusion is true, lim inf 4, and lim sup 4, are 
equal to the same set 4. Then 4 is called the imit of 4, and is denoted 
by lim 4,; the sequence 4, is said to converge to 4and wewrite 4, — ZA. 
Clearly, limits (inferior or superior) of sequences of sets are formed by 
denumerable set operations. 

Monotone sequences form a basic class of convergent sequences. A 
sequence 4, is said to be monotone if it is either nondecreasing: A, C Ae 
C ---+,and we then write 4, 7 ; or if itis monincreasing: 4, D Ag D-:-; 
and we then write 4, |. From the expressions above of inferior and 
superior limits, it follows at once that 


every monotone sequence is convergent, and lim 4, = U An or (| An 
according as An} or An|. 


Moreover, if*we consider this proposition as a definition of limits of 
monotone sequences then, since for an arbitrary sequence Bp, 


(\ B, = inf B, fT and U B. = sup B,J, 
k=n k2n ken k2n 
it follows that its inferior and superior limits can be defined by 


lim inf By, = lim (inf B,) and lim sup B, = lim (sup B;). 
n kon n ken 
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1.4 Indicators of sets. Set operations can be replaced by equivalent 
but more familiar ones, in the following manner. To every set 4 as- 
sign a function J, of w, to be called the indicator of A, defined by 


I4(@w) = 1 or O accordingas w€ 4 or wv. 


Conversely, every function of w which can take only the values 0 and 1 
is the indicator of the set for the points of which it takes the value 1. 
The one-to-one correspondences (denoted by <) and relations listed 
below are immediate. 


Ing Ip SACB, Ig=ip © A=B, I4zp=0 © AB= GQ, 
Ig=0, Je=1, Lata = 1, 
Tne a, = NFL 4, LIsup a, = sup La, 
Iya, = UW Law Log, = 2 Lan 
Ty4, = fart CG — La)La, + (l — La) — Lan)La, +°°° 
Tim int A, = liminfl4,, Jiim sup 4, = limsupla,, Lim a, = lim T,g,. 


15 Fields and o-fields. Classes of sets in Q are sets in the space 
S(Q) of all sets in Q and thus what precedes applies to classes. How- 
ever, there is a notion specific to classes—that of closure under one or 
more set operations. A class @ is said to be closed under a set opera- 
tion if the sets obtained by performing this operation on sets of © are 
sets of ©. In particular, the class S(Q) of all sets in Q is closed under 
every set operation. 

In connection with the notions of measurability and of measure, two 
species of classes play a prominent role—fields and o-fields. A field is 
a (nonempty) class closed under all finite set operations; clearly, every 
field contains @ and 2. A o-field is a (nonempty) class closed under all 
countable set operations; clearly every o-field 1s a field. We observe 
that, because of the duality rule, closure under complementations and 
finite (countable) intersections implies closure under finite (countable) 
unions. Also we can interchange in this property “intersections” and 
“unions.” 

Let S-classes be species of classes closed under set operations 8; for 
example, the species of fields or the species of o-fields. We observe that 
S(Q) is an 8-class, whatever be the set operations 8. 


a. Arbitrary intersections of 8-classes are 8-classes. In particular, arbi- 
trary intersections of fields or of o-fields are fields or o-fields, respectively. 
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For the intersection of a collection of $-classes belongs to every one of 
these classes. Therefore, performing operations $ on sets of the inter- 
section, we obtain sets belonging to every one of these classes, that is, 
to the intersection. 

This property gives rise to the notion of a “minimal” S-class over a 
given class. An 8-class ©’ containing © is a minimal class over © or 
the S-class generated by © if every S-class containing © contains @’. 


b. There is one, and only one, minimal 8-class over a class ©. In par- 
ticular, there is one, and only one, minimal field and one, and only one, 
minimal o-field over @. 


For the intersection of all S-classes containing © contains @ and is con- 
tained in every S-class containing @. 

A space Q in which is selected a fixed o-field @ is called a measurable 
space (Q, @). If there is no confusion possible, the sets of @ are said to 
be measurable. 

1.6 Monotone classes. We shall need the notion of monotone 
classes in connection with the problem of extending measures on a 
field to its minimal o-field. A monotone class is a class closed under 
formation of limits of monotone sequences. 


a. A o-field is a monotone field and conversely. 


The first assertion is obvious and the second follows from the fact that 
every countable intersection (| 4, and union U 7, is a monotone 
nr nr 
limit of sequences (} 4, and LU & of finite intersections and unions. 
k=1 k=1 
The property we shall require is as follows: 


A. The minimal monotone class 9 and the minimal o-field @ over the 
same field © coincide. 


Proof. On account of a and minimality of 9 and @, it suffices to 
prove that 9M is a field; for, a monotone field SW is a o-field so that 
Xv D @, and the o-field @ is monotone so that NC @. Since 27D C 
> Q and unions are reducible to intersections (by means of complemen- 
tations), it suffices to prove that, if 7 and B belong to M, so do AB, 
A°B, and AB’. 

For every fixed 4 € 9M, let Mts be the class of all B € Mt with the 
asserted property. Every 914 is monotone for, if the sequence B, € Wa 
is monotone, then B = lim B, belongs to 9M and so do the limits of 
monotone sequences 


AB =\im AB, ACB = lim 4°B,, AB = lim 4B,’. 
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It follows that, for every 4 € @, the class Mt4 coincides with MW. For 
© being a field, every BEC @ is € Sy, so that CC My C M and, 
hence, IW being minimal over C, M4 = MM. In fact, Wz = I for every 
Bem. For, the conditions imposed upon pairs 4, B being symmetric, 
BE M(=Ma4 for JZ € C) is equivalent to 7 € Mg for every ZC C 
so that © C 9g and hence as above, Wg = M. But this last property 
means that 91 is a field, and the proof is complete. 

*1.7 Product sets. We introduce now a different ty,e of set opera- 
tion and corresponding notions, for which we shall have need later. 
Let 4, and 4, be two arbitrary sets with elements w, and we, respec- 
tively. By the product set 4, X Az we shall mean the set of all ordered 
pairs w = (wy, we) where w; € Ay and we € Ae. If 41, Bi, --- are 
sets in a space Q; and 42, Bo, +--+ are sets in a space Ng, then 4; X a2, 
B, X Bo, +--+ are sets in the product space Q, X Qe, called intervals or 
rectangles in Qy X Qe and the properties below follow readily from the 
definition: 


(4; X do) N (Bi X Be) = (4, N Bi) X (42 N Be) 
(41 X Ae) — (Bi X Be) = (41 — Bi) X (42 — Be) + (Ahi — Bi) 
X (42 N Bo) + (41 N Bi) X (42 — Be) 
In turn, it follows at once from these relations that 


a. If C; and Cy are fields of sets in Qy and Q2 respectively, then the class 
of all finite sums of intervals 4, X Ae, where Ay © © and Ag © Ca, 
is a field of sets in Qy X Qo. 


This field will be called the product field of ©, and Gz. 

Yet, if @; and Qe are o-fields of sets in Q, and Mpg, respectively, then 
the product field of @; and Q@z is not necessarily a o-field. The minimal 
o-field over it will be called the product o-field Qi X Qo. If (Qh, @1) 
and (Q{2, @z) are measurable spaces, then their product measurable space 
1s, by definition, (Qy X Qe, Qi X Qo). 

Let Q = Q) X QO. and @ = Qi X Go. If 4 CQ is measurable and 
w, € Q; is a fixed point, then the set 4(w) of all points we € Qe such 
that w = (wy, we) € Z is called the section of A at w1; similarly for the 
section (we) at wo € Qe; by the definition, 4(w,) CQ, and (we) 
CQ4. 


b. Every section of a measurable set is measurable. 


For let © be the class of all measurable sets in Q whose sections are 
measurable. It is easily seen that © is a o-field. On the other hand, 
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if 4 = A, X Ag is a measurable interval, that is, 4, and 4» are meas- 
urable, then every section of 4 is either empty or is 4; or 4e, so that 
A, X 42€ ©. Therefore, @ = Q; X Qo, being the minimal o-field over 
all measurable intervals, is contained in @, and the assertion is proved. 


The foregoing definitions and properties extend at once to any finite 
number of sets and of measurable spaces. However, in the nonfinite 
case, some of these definitions have to be modified in order to preserve 
these properties. 

Let {4,, ¢€ T} be an arbitrary collection of arbitrary sets 4; in 
arbitrary spaces Q; of points w,. The product set Ar = 4 is the set 


of all the new elements wr = (w, ¢ © T) such that wo € 4; for every 
t€ T. The product set 47 is in the product space Xr = [J :; we drop 
ter 


“t € T” if there is no confusion possible. It follows from the foregoing 
definition that, for any set B, when the Q; are identical 


(V4) XB=f\(4 xB) (U4) XB=U(4X B). 


Let Ty = (t1, ++, ¢w) be a finite index subset and let 47, be a set in 

the product space Qr,. The set Ary X Qr_ry is a cylinder in Qp with 

base Ary. If the base is a product set [J 4, the cylinder becomes a 
te Tn 


product cylinder or an interval in Qr with sides 4,,t€ Ty. Let @; be 
fields in Q,;. It 1s easily seen that, as in the finite case, 


A. The class of all finite sums of all the intervals in Qy with sides Ay © Ct, 
is a field of sets in Qr. 


This field is the product field of the fields @;. 

Let (%, @:) be measurable spaces. The minimal o-field over the 
product field of the @; is the product o-field @p = [J Q; of measurable 
sets in Qp, and the measurable space (Or, @r) is the product measurable 
space (JJ %, JJ @:) of the measurable spaces (Q, @:). It is easily seen, 
as in the finite case, that b remains valid: 


B. Sections at wry of measurable sets in Qp are measurable sets in Qp_ ty. 


*1.8 Functions and inverse functions. Perhaps the most important 
notion of mathematics is that of function (or transformation, or map- 
ping, or correspondence). We have already encountered functions de- 
fined on an index set T whose “values” are sets in Q. In general, a 
function X on a space Q—the domain of X—to a space Q’/—the range 
space of X—is defined by assigning to every point w € Qa point w’ € 
called the value of X at w and denoted by X(w). Sets and classes of 
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sets in Q’ will be denoted by 4’, B’, ---, and @’, @’, ---, respectively. 
It will be assumed, once and for all, that functions are single-valued, 
that is, to every given w € 2 corresponds one, and only one, value 
X(w). 

The set of values of X for all w € 4 is called the image X(4) of 
A (by X) and the class of images X(4) for all 4 € @ is called the image 
X(@) of © (by X); in particular X(Q) is the range (of all values) of X. 
Thus, a function X on 2 to ©’ determines a function on S(Q) to S(’). 
While this new function is of no great interest, such is not the case for 
the inverse function that we shall introduce now. 

By [w; ---] where --- stands for expressions and/or relations involv- 
ing functions on Q, we denote the set of points w € Q for which these 
expressions are defined and/or these relations are valid; if there is no 
confusion possible we drop “w;”. Thus, LX = w’], or inverse image of 
w’, is the set of all points w for which X(w) = w’; LX € ’], or inverse 
image of A’, is the set of all points w for which X(w) € 4’; and [4; 
X(A) € @’], or inverse image of @’, is the class of inverse images of all 
sets 4’ € @’. We observe that the inverse image of an w’ which does 
not belong to the range of X is the empty set @ in Q. 

The inverse function X~! of X is defined by assigning to every 4’ 
its inverse image [X € 4’]. In other words, X~! is a function on 
5(Q’) to S(Q) with values X7'(4’) = [X € 4’); if A’ = {w’}, then we 
write X~1(w’) for X7}({w’}) = LX = w’]. Since X is single-valued, X— 
generates a partition of Q into disjoint inverse images of points w’ € 0’. 
It follows readily that 


X74! — BY) = X14’) — X18), 
XY 4) =U X14), X(N 4') = A X14), + 
Therefore, 


A. Basic PROPERTY OF INVERSE FUNCTIONS: Inverse functions preserve 
all set and class inclusions and operations. 


It follows at once that 


If @' is closed under a set operation so is X~\(@'). In particular, the 
inverse image of a o-field is a o-field, and the inverse image of the mini- 
mal o-field over @' is the minimal o-field over X—\(@’). 


Moreover, 


If @ is a o-fteld so is the class of all sets whose inverse images belong to Q. 
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The notion of function can be “iterated” as follows. Let X be a 
function on © to Q’ and let X’ be a function on 0’ to Q”. Then, the 
function of function X'X defined by (X’X)(w) = X’(X(w)) is a function 
on 2 to 2”. Clearly, its inverse function (X’X)7 is a function on 
S(2’") to S(Q) such that, for every set 4” CQ”, 


(X'X)1(A4") _ XX’ 1(A")) 
or, in a condensed form, 
(XX) = XTX, 


*1.9 Measurable spaces and functions. So far, we did not consider 
particular species of functions. There are two species which play a 
basic role in abstract analysis. We shall introduce them now. But 
first we examine, in more detail, the class of inverse images of points 
of the range space. 

Let X be a function on 2 to Q’. The partition of Q formed by the 
inverse images X~*(w’) of all points w’ € 0! is said to be induced (or 
determined) by X and X is said to be constant (=w’) on X~1(w’). Since 
the class of values X~'(4’) of X~ is the inverse image of the o-field 
of all sets 4’ in’, it is a o-field. If the partition induced by X is finite, 
or denumerable, or countable, then X is said to be finitely, or denumerably, 
or countably valued, respectively; in other words, X is, say, countably 
valued if the set of its values is countable. Setting 4; = [X = o’j], 
we can write every countably valued function X as a countable combi- 
nation of indicators: 


X = 9) 0's; 
j 


Conversely, we make the convention that every time such a “sum”’ is 
written, the sets 4; form a partition of the domain of the function X. 
If the w’; are distinct, then this partition is the one induced by the func- 
tion represented by the “sum.” 

Now, let @ be a fixed o-field in Q. Q, together with @, is called a 
measurable space (Q, @), and the sets of @ are then said to be measurable 
(although this terminology derives from the notion of measure, we em- 
phasize that, nowadays, the notion of measurability is independent of 
that of measure). A countably valued function X = }) w';I4,, where 
the sets 4; are measurable, is called a countably valued measurable 
function—for short, an elementary function; if X is finitely valued, then 
this elementary function is also called a simple function. Clearly 


the sets of the o-field induced by an elementary function are measurable. 
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We are now in a position to introduce the general notion of measurable 
functions. However, there are several ways for doing so, and the 
classes of measurable functions so defined are, in general, not the same. 

One way of defining measurable functions is to extend a basic property 
of inverse functions of elementary functions, as follows: Let (Q, @) and 
(Q’, @’) be two measurable spaces. The inverse images by elementary 
functions on Q to Q’ of measurable sets are measurable. Extending this 
property, we say that a function X on © to ’ is measurable if the in- 
verse images by X of measurable sets (€ @’) are measurable (€ @). If, 
moreover, (’’, @’’) is a measurable space and X’ on 0’ to Q” is a meas- 
urable function, then X’X is measurable, for 


(X'X)“(a") = (X7EX'1(e")) c X12) C@. 


Thus, with this definition, a measurable function of a measurable function 
is measurable. 

Another way of defining measurable functions is as follows: Let 
(Q, @) be a measurable space on which are defined simple (elementary) 
functions to a space 9’ (there are no measurable sets in Q’). A notion 
of limit is introduced on 2’, and measurable functions in the sense of this 
limit are then defined to be limits of convergent sequences of simple 
(elementary) functions. This approach is particularly suited for the 
introduction of integrals of measurable functions. Later we shall see 
cases in which measurable sets and the notion of limit are selected in 
such a manner that the two definitions are equivalent. 


*§ 2. TOPOLOGICAL SPACES 


The selections of measurable sets and of concepts of limit in range- 
spaces are rooted in the properties of the euclidean line: real line R = 
(—o, +0) with euclidean distance | x — y| of points (numbers, reals) 
x, y. Species of spaces vary according to the preserved amount of 
these properties, an amount which increases as we pass from separated 
spaces to metric spaces, then to Banach spaces and to Hilbert spaces. 
We examine here the basic properties of these spaces and shall encounter 
them in various guises throughout the book. At the same time, the 
few notions of topology which follow are a recapitulation of the prop- 
erties of the euclidean line and, more generally, of euclidean spaces. 
We urge the reader to keep this fact constantly in mind by illustrating 
the concepts and their relationships in terms of euclidean spaces; for 
this reason, we denote here the points by x, y, 2, with or without affixes. 
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Points, sets, and classes will be those of the space X under considera- 
tion, unless otherwise stated. 


We use without comment the axiom of choice: given a nonempty class 
of nonempty sets, there exists a function which assigns to every set of 
the class a point belonging to this set; in other words, we can always 
““choose’”’ a point from every one of the sets of the class. 

2.1 Topologies and limits. A class © is a topology or the class of 
open sets if it is closed under formation of arbitrary unions and finite 
intersections and contains § and © (the last property follows from the 
closure property by the conventions relative to intersections and unions 
of sets of an empty class). The dual class of complements of open 
sets is the class of closed sets; hence it is closed under formation of arbi- 
trary intersections and finite unions and contains Q.and @. 

A topological space (X, ©) is a space X in which is selected a topology 
O; from now on, all spaces under consideration will be topological and 
we shall frequently drop “0.” <A sopological subspace thereof (4, 04) 
is a set 4 in which is selected its imduced topology O4 which consists of 
all the intersections of open sets with 4 and is, clearly, a topology in 
A. It is important to distinguish the properties of 4 considered as a 
set in (X, 0) from those of 4 considered as a topological subspace of 
(X, 0). _ 

To every set 4 there are assigned an open set 4° and a closed set 7, 
as follows. The interior A° of 4 is the maximal open set contained in 
A, that is, the union of all open sets in 4; in particular, if 4 is open, 
then 4° = A. The adherence A of A is the minimal closed set contain- 
ing 4, that is, the intersection of all closed sets containing 4; in par- 
ticular, if 4 is closed, then 4 = 4. The definitions of interiors and 
adherences of 4 and 4° are clearly dual, so that 


(4°)° = (4°), (4°)? = (A)’. 


In topological spaces relations between sets and points are described 
in terms of neighborhoods. Every set containing a nonempty open 
set 1s a neighborhood of any point x of this open set; the symbol V, 
will denote a neighborhood of x. The points of the interior 4 of 4 
are “interior” to 4; in other words, x is interior to Jif Aisa V,. The 
points of the adherence 4 of 4 are adherent to 4; in other words, «x is 
adherent to A if no V, is disjoint from 4, that is, x € (4°)? = (4)°. 


Classical analysis is concerned primarily with continuous functions 
on euclidean lines to euclidean lines. In general, a function X on a 
topological domain 2 to a topological range space & is continuous at 
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w € Q if the inverse images of neighborhoods of * = X(w) are neigh- 
borhoods of w; X is continuous (on Q) if it is continuous at every w € Q. 
Since taking inverse images preserves all set operations, it follows 
readily that we can limit ourselves to open (closed) sets. Thus X is 
continuous if, and only if, the inverse images of open (closed) sets are 
open (closed) and, hence, a continuous function induces on its domain 
a topology contained in (no “finer” than) that of the domain. There- 
fore, if in topological spaces the o-fields of measurable sets are selected to 
be the minimal a-fields over the topologies, then continuous functions are 
measurable. Yhe importance of the concept of continuity is empha- 
sized by the fact that two spaces X and &’ are considered to be “‘topo- 
logically equivalent” if, and only if, there exists a one-to-one corre- 
spondence X on X to X’ such that X and X7! are continuous. 


The basic concept which distinguishes classical analysis from classical 
algebra and which gave rise to the various concepts examined in this 
section is that of limit of sequences of numbers. In a topological space 
it becomes: x 1s limit of a sequence x, or the sequence *, converges to x 
if, for every V, there exists an integer m(/’,) such that x, € V, for 
all x 2 n(V,). However, the need for a more general concept of limit 
is already apparent in the classical theory of integration where the par- 
titions of the interval of integration form a “direction” and the Riemann 
sums form a “directed set’ of numbers of which the Riemann integral, 
if it exists, is the “limit.” It so happens that this type of limit is pre- 
cisely the one required for general topological spaces, and we now de- 
fine the foregoing terms; the role of sequences in some species of spaces 
(including the euclidean ones) will be better understood when consid- 
ered within the general setup. 

Let T be a set of points ¢, with or without indices. T is partially 
ordered if a partial ordering is defined on it. A partial ordering “‘<,” 
to be read “precedes,” is a binary relation which is transitive (¢ < ¢’ 
and ¢’ < ¢” imply ¢ < #”’), reflexive (¢ < #), and such that, if ¢ < #/ and 
t’ < ¢, then ¢ = ¢#’; upon writing ¢’ > ¢ when ¢ < ?#’, the relation “>,” 
to be read “follows,” is also a partial ordering. T is a direction if it is 
partially ordered and if every pair ¢, ¢’ is followed by some ¢” (¢ < #”, 
t! < t"). Tis linearly ordered, and a fortiori is a direction, if every pair 
t, t’ 1s ordered (either ¢ < ¢’ or ¢/ < #). For example, the sets in a space 
are partially ordered by the relation of inclusion and the neighborhoods 
of a point x form a direction (this is the root of the definition of limit 
as given below); the finite partitions of an interval of integration form 
a direction when ordered by the relation of refinement; integers and, 
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in general, sets of numbers are linearly ordered by the relation “S,” 
etc. 

A function X on T to X can be represented by the indexed set {x;,} of 
its values which may or may not be distinct but which are always dis- 
tinguished by their indices ¢. The indexed set {x;} is directed if T is a 
direction; sequences {x,} are special directed sets representing func- 
tions on the (linearly ordered) set of positive integers. We are now 
ready to define the general concept of limit. 

The point « is the mit of a directed set {x,} and we write x = lim x, 
or, equivalently, x; converges to « and we write x, — *, if, for every 
V,, there exists an index ¢(V,) such that «; € V, for all those indices 
which follow ¢(V,). However, the concept of limit is of use only if, 
when the limit exists, it is unique; this requirement leads to the intro- 
duction of “‘separated” or “Hausdorff” space as follows: 


A. SEPARATION THEOREM. The following three definitions are equiva- 
lent. A topological space ts separated if 


(Si) every directed set has at most one limit, 

(So) every pair of distinct points has disjoint neighborhoods, 

(S3) the intersection of all closed neighborhoods of a point reduces to this 
point. 


The term “‘separated” expresses property (Sg). 

We observe that, according to (S3), in a separated space every set 
reduced to a point ts closed. 

Proof. (S1) and (Se) are equivalent. Let x Ay. If x, —> * and 
x, — y, then x, € V, 1 V, for all those ¢ which follow both ¢(V,) and 
t(V,); since T is a direction such ¢ exist so that no pair V,, Vy is dis- 
joint. 

Conversely, if no pair V,, Vy is disjoint, then there exist points 
2(V2, Vy) € V,M Vy and, since these pairs form a direction when 
ordered by the relation (Vz, Vy) < (M2, V'y) 1f Vz DM’, and Vy D V's, 
these points form a directed set converging to both x and y. 

(So) and (S3) are equivalent. If for every y * « there exists a V, 
such that y ¢ V,, then the intersection of all V, reduces to x. Con- 
versely, if the intersection of all Vz reduces to the set formed by x, 
then, for every y ¥ x, there exists a V, such that y € V;, and the open 
set (V,)° is a neighborhood of y disjoint from Vz. The proof is termi- 
nated. 

From now on, all spaces will be separated spaces. 
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2.2 Limit points and compact spaces. Analysis of concepts or prop- 
erties leads to the introduction of “weaker” ones. A property © is 
weaker than a property @’ if @’ implies @; @ is a necessary condition for 
@’ and ©’ is a sufficient condition for @. 

Perhaps even more basic than the concept of limit is the weaker one 
of limit point. A point « is a limit point of the directed set {x;,} if, 
for every pair ¢, Vz, there exists some t’ > ¢ such that x» € Vz. The 
definitions of limit and of limit point yield at once (i) and (ii) of the 
proposition below, and then (iii) follows. 


a. Let the sets A, be formed by all those points xy for which t' follows t: 
A, = {x4 t'> th. 


(1) «x, — x tf, and only if, for every V, there exists an A, CG Vz. 
(11) x 45 a limit point of {x1} if, and only if, no pair Ay, Vz is disjoint. 
(it) the set of all limit points of {x,} coincides with the intersection of all 
A,, and if x; —> x then this set reduces to the single point x. 


The reason for the somewhat confusing terminology above is that 
every limit point. of {x;,} is the limit of some subset of {x;}, in the fol- 
lowing sense. A direction S of elements 5, 5’, --- is a subdirection of 
the direction T when there exists a function f on S to T with the prop- 
erty that, for every ¢, there is an s such that, if s’ follows s, then ’ = 
F(s’) follows ¢. The set {xy(s)} directed by the subdirection S of T is a 
subdirected set. Clearly, if x, — x, then every subdirected set x;(,) — ¥. 


b. 4 point x 1s a limit point of a directed set {x,} if, and only if, the 
set contains a subdirected set which converges to x. 


Proof. The “if” assertion follows at once from the definitions. As 
for the “only if” assertion, it suffices for every pair s’ = (t, Vz) to 
take f(s’) = ¢’ > ¢ such that x» € V, and direct the pairs by (4, V,') 
> (ta, V2”) when ty > f and V,! c V,”. 


Compact spaces are separated spaces in which every directed set has 
at least one limit point; a set is compact if it is compact in its induced 
topology. Compactness plays a prominent role in analysis and it is 
important to have equivalent characterizations of compact spaces. We 
shall use repeatedly the following terminology: a subclass of open sets 
is an open covering of a set if every point of the set belongs to at least 
one of the sets of the subclass. 
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A. CoMPACTNESS THEOREM. The following three properties of separated 
spaces are equivalent: 


(C;) Botzano-WEIERSTRASS PROPERTY: every directed set has at least 
one limit point. 

(Cs) HeEineE-BoREL PROPERTY: every open covering of the space contains 
a finite covering of the space. 

(C3) INTERSECTION PROPERTY: every class of closed sets such that all its 
jinite subclasses have nonempty intersections has itself a nonempty 
intersection. 


If some class has the property described in (C3), we say that it has the 
finite intersection property. 


Proof. The intersection property means by contradiction that every 
class of closed sets whose intersection is empty contains a finite sub-. 
class whose intersection is empty. Thus, it is the dual of the Heine- 
Borel property, and it suffices to show that it is equivalent to the Bol- 
zano-Weierstrass one. 

Let {x;} be a directed set and, for every f € T, consider the adher- 
ence of the set of all the «, with ¢ following fp. Since T is a direction, 
these adherences form a class of closed sets with finite intersection 
property. Thus, if the intersection property is true, then there exists 
an x« common to all these adherences and it follows that « is a limit 
point of {x}. 

Conversely, consider a class of closed sets with the finite intersection 
property and adjoin all finite intersections to the class. The class so 
obtained is directed by inclusion so that, by selecting a point from every 
set of this class, we obtain a directed set. If the Bolzano-Weierstrass 
property is true, then this set has a limit point and this point belongs 
to every set of the class; hence the intersection of the class is not empty. 
This completes the proof. 


CoMPACTNESS PROPERTIES. 1° In @ compact space, a directed set 
x, — x if, and only if, x is its unique limit point. 


Proof. We use a and its notations. The “only if” assertion holds 
by a(iii). As for the “if” assertion, if x; ++ * then, by a(i), there ex- 
ists a Vz such that no 4; is disjoint from V,,°; thus, for every ¢ we can 
select a ¢’ > ¢ such that x, € 4; 1M V,°. Since the space is compact, 
the subdirected set {x,-}, hence, by b, the directed set {x,}, has a limit 
point x’ € V,°. Therefore, x ¥ x’ and x cannot be the unique limit 
point of {x;}. 
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2° Every compact set 1s closed, and in a compact space the converse 1s 
true. 


Proof. Let 4 be compact and let V,, V,(x) be a disjoint pair of 
open neighborhoods of x € 4 and y € 4°. By the Heine-Borel prop- 
erty, the open covering {V,} of 4 where x ranges over 4 contains a 
finite subcovering {V,,}, and the disjoint open sets V = U VV = 


(1) Vy) are such that 74 CV and y€V', Thus, the open neigh- 
k 


borhood /” of y contains no points of 4; hence y € A. Since y € 4° 
is arbitrary, it follows that 4° and 4 are disjoint, and the first asser- 
tion is proved. The second assertion follows readily from the inter- 
section property. 


3° The intersection of a nonincreasing sequence of nonempty compact 
sets 1s not empty. 


Apply the intersection property. 


4° The range of a continuous function on a compact domain is com- 
pact. 


Proof. Because of continuity of the function, the inverse image of 
every open covering of the range is an open covering of the compact 
domain; hence it contains a finite open subcovering which is the inverse 
image of a finite open subcovering of the range. Thus, the range has 
the Heine-Borel property, and the assertion is proved. 


The euclidean line R = (—%, +) is not compact but, according to 
the Bolzano-Weierstrass or Heine-Borel theorems, every closed inter- 
val [a, 5] is compact. These theorems become valid for the whole line 
if it is ““extended’’—that is, if points —% and ++ are added. Thus, 
the extended euclidean line R = [—%, +] is compact. In fact, R is 
locally compact and every locally compact space can be compactified 
by adding one point only, as below. 

A separated space is /ocally compact if every point has a compact 
neighborhood; it is easily shown that every neighborhood then contains 
a compact one. The one-point compactification of a separated space 
(X, ©) is as follows. Adjoin to the points of & an arbitrary point © € & 
and adjoin to the open sets all sets obtained by adjoining to the point 
co those open sets whose complements are compact. Denote the topo- 
logical space so obtained by (X,, ©). 
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5° The one-point compactification of a locally compact but not com- 
pact space is a compact space, and the induced topology of the original 
space is its original topology. 


Proof. The last assertion follows at once from the definition of O.. 
As for the first assertion, observe that the new space is separated, since 
two distinct points belonging to the separated original space are sepa- 
rated and the point © is separated from any x € & by taking a com- 
pact and hence closed V; C X, so that » € V,°. Also, the new space 
has the Heine-Borel property, since an open covering of it has a member 
O + {«} with O° compact and hence contains a finite subcovering of O° 
which, together with O + {>}, is a finite subcovering of the new space. 

2.3 Countability and metric spaces. The euclidean line possesses 
many countability properties, among them separability (the countable 
set of rationals is dense in it) and a countable base (the countable class 
of all intervals with rational extremities); this permits us to define limits 
in terms of sequences only. In general topological spaces, a set 4 is 
dense in B if 4 > B; in other words, taking for simplicity B = X, 7 is 
dense in X if no neighborhood is disjoint from 4; and B 1s separable if 
there exists a countable set 4 dense in B. A countable base at x isa 
countable class {Y,(7)} of neighborhoods of * such that every neigh- 
borhood of « contains a V,(7); and the space has a countable base {V(j)} 
if, for every point *, a subclass of V(y)’s is a base at x. 


a. A space has a countable base only if itis separable and has a countable 
base at every point. Then every open covering of the space contains a 
countable covering of the space. 


Note that if a countable set {x;} is dense in a metric space, then at 
every x; there is a countable base of spheres of rational radii, and the 
countable union of all these countable bases is a base for the space. 

Proof. If the space has a countable base {V(s)}, then it has a count- 
able base at every point. Moreover, if 4 is a set formed by selecting a 
point x; from every V(j/), then, since any neighborhood of any point 
contains a V(j), it contains the corresponding point xj, so that no 
neighborhood is disjoint from 4. 

Finally, given an open covering of the space, every one of its sets 
contains a V(j/) so that, for every V(j/), we can select one set O; of the 
covering containing it. The countable class {O;} 1s an open covering 
of the space, and the proof is terminated. 


A basic type of space with a countable base at every point ts that of 
metric spaces. In fact, topologies in euclidean spaces are determined 
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by means of distances; this approach characterizes metric spaces. A 
metric space is a space with a distance (or metric) don X X X to R such 
that, whatever be the points x, y, z, this function has 


the triangle property: d(x, y) + d(x, z) 2 d(y, 2), 
the identification property: d(x, y) =O @& «x = yy. 


Upon replacing z by x and interchanging ~ and _y, it follows that 
d(x, y) = d(y, x), d(x, y) 2 9. 


It happens frequently, and we shall encounter repeatedly such cases, 
that, for some space, a function d with the two foregoing properties 
can be defined—except for the property d(x, y) = 0 = « = y. Then the 
usual procedure is to identify all points *, y such that d(x, y) = 0; the 
space is replaced by the space of “classes of equivalence” so obtained, 
and this new space is metrized by d. 


The topology of a metric space (X, d) is defined as follows: Let the 
sphere V(r) with “center” « and “radius” r(>0) be the set of all points y 
such that d(x, y) <r. A set 4 1s open if, for every x € JA, there exists 
a sphere V,,(r) C 4; it follows, by the triangle property, that every 
sphere is open. Clearly, the class of open sets so defined is a topology. 
Since, by the identification property, d(x, y) > 0 when x ¥ y and the 
spheres VY,(r) and V,(s) are disjoint for 0 <7, 5 S $d(x, y), it follows 
that with the metric topology so defined, the space is separated; we ob- 
serve that x, — * means that d(x,, x) — 0. 

A basic property of the metric topology is that at every point x there 


. 1 
is a countable base, say, the sequence of spheres V, (;) »n=1,2,--, 
n 


and it is to be expected that properties of metric spaces can be charac- 
terized in countable terms. To begin with: 


1. Seguences can converge to at most one point. 

2. A point x € A if, and only if, A contains a sequence X, —> %, SO 
that a set 1s closed if, and only tf, limits of all convergent sequences of its 
points belong to it. 

3. Every closed (open) set is a countable intersection (union) of open 
(closed) sets. 

4. A metric space has a countable base if, and only if, it is separable. 

5. If X 1s a function on a metric domain (Q, p) to a metric space (X, a2), 
then X(w') — X(w) as w' — w if, and only if, X(wn) — X(w) whatever 


be the sequence w», — w. 
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Proof. The first assertion follows from the separation theorem. 
The “if” part of the second assertion is immediate, and for the “only 


1 
if” part it suffices to take x, C AN V, (-) . 


1 
For the third assertion, form the open sets O, = U V, (;) ; those 
ze€aA 
sets contain 4, so that 4C()O,. On the other hand, for every 
, 1 
* € () On there exist points *, €O, such that « € V,, (=) , and 


hence *, — #; since @ is closed, it follows by the second assertion that 
x € A, and hence 4 D (} O,. Thus, closed 4 = () O, and the dual 
assertion for open sets follows by complementations. 

The fourth assertion follows from a. 

Finally, if X(w’) — X(w) as w’ — w, then, clearly, X(w,) — X(w) 


aS Wn —> w. Since X(w’) + X(w) as w’ — w implies that there exist 
1 ; 
points w, € V,, (:) such that X(wn) + X(w), while w, — w, the last 


assertion follows. 

Metric completeness and compactness. The basic criterion for con- 
vergence of numerical sequences is the (Cauchy) mutual convergence 
criterion: a sequence %p 1s mutually convergent, that is, d(%m, %n) —> 0 
as m, n —> © if, and only if, the sequence x, converges. In a metric 
space, if x, — x, then, by the triangle inequality, d(%m, %n) S d(x, Xm) 
+ d(x, %n) — 0 as m, n — o&, but the converse is not necessarily true 
(take the space of all rationals with euclidean distance); if it is true, 
that is, if d(%m, %n) —> O implies that x, — some x, then the mutual 
convergence criterion is valid, and we say that the space is complete. 
Complete metric spaces have many important properties, which follow. 

Call A(4) = sup a(x, y) the diameter of A; A is bounded if A(A) is 

Ty 


finite. 


A. CANTOR’S THEOREM. In a complete metric space, every nonincreas- 
ing sequence of closed nonempty sets An such that the sequence of their 
diameters A(An) converges to 0 has a nonempty intersection consisting of 
one point only. 


Proof. Take %,€ A, and m2 xn. Since d(xm, %n) S A(4n) — 0, 
it follows that x, — some x. Since *m € Am C A, for all m =n and 
the set 4, is closed, x belongs to every 4n; hence x €— [) An. If now 
d(x, x’) > 0, then, from some & on, d(x, «’) > A(4,) so that x’ £ A; 


> () 4n. The assertion is proved. 
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A set 4 is nowhere dense if the complement of 7 is dense in the space, 
or, equivalently, if 7 contains no spheres, that is, if the interior of 7 
is empty. A set is of the first category if it is a countable union of no- 
where dense sets, and it is of the second category if it is not of the first 
category. 


B. Barre’s CATEGORY THEOREM. Every complete metric space is of the 
second category. 


Proof. Let 4 = U 4, where the 4, are nowhere dense sets. There 
exist a point x; € 4; and a positive r; < 1 such that the adherence of 
V,(r1) is disjoint from 4;. Proceeding by recurrence, we form a de- 


creasing sequence of spheres (rn) such that V,,(rn) is disjoint from 


1 . 
An and tf, <— — 0. Therefore, by Cantor’s theorem, there exists a 
n 


point x € () V,z,(7n) and, because of the foregoing disjunction, x Z U An. 
Thus 4 ¥ &, and the theorem follows. 


We investigate now compact metric spaces and require the two fol- 
lowing propositions. 


b. If every mutually convergent sequence contains a convergent subse- 
quence, then the space is complete. 


This follows from the fact that if a sequence x, is mutually convergent 
and contains a convergent subsequence *,, — x, then, by the triangle 
inequality, d(%n, *) S d(%n1, *n) + d(%n, *) 2 0 as n, n' — &, so 
that x, — «*. 

A set is totally bounded if, for every ¢ > 0, it can be covered by a 
finite number of spheres of radii S e. Clearly, a totally bounded set 
is bounded, and a subset of a totally bounded set is totally bounded. 


c. 4 metric space is totally bounded if, and only if, every sequence of 
points contains a mutually convergent subsequence. A totally bounded 
metric space has a countable base. 


Proof. Wet the space be not totally bounded; there exists an e > 0 
such that the space cannot be covered by finitely many spheres of radii 
< e«. We can select by recurrence a sequence of points *, whose mu- 
tual distances are = e; for, if there is only a finite number of points 
X1, ***, %m with this property, then the spheres of radius e centered 
at these points cover the space. Clearly, this sequence cannot contain 
a mutually convergent subsequence. 
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Conversely, let the space be totally bounded, so that every set is 
totally bounded. Then any sequence of points belonging to a set con- 
tains a subsequence contained in a sphere of radius S e—member of a 
finite covering of the set by spheres of radii S «. Thus, given a se- 
quence {xp}, setting « = 4, 4, ---, and proceeding by recurrence, we 
obtain subsequences such that each is contained in the preceding one 
and the &th one is formed by points 14, ¥oz, °°: belonging to a sphere 
of radius S 7 The “diagonal” subsequence {*nn} is such that, from 


; 1 , 
the &th term on, the mutual distances are S R? hence this subsequence 
is mutually convergent. 


The last assertion follows from the fact that given a totally bounded 
. 1 

space, the class formed by all finite coverings by spheres of radii S — , 
n 


n = 1,2, --+ is a countable base. 


C. METRIC COMPACTNESS THEOREM. The three following properties of 
a metric space are equivalent: 


(MC,) every sequence of points contains a convergent subsequence; 

(MC,) every open covering of the space contains a finite covering of the 
space (Heine-Borel property); 

(MC3) the space is totally bounded and complete. 


Proof. It suffices to show that (MC.) = (MC,) = (MC3) => 
(MCo). 

(MC) = (MC,). Apply the compactness theorem. 

(MC,) = (MC3). Let every sequence of points contain a convergent 
(hence mutually convergent) subsequence. Then, by b, the space is 
complete and by ¢, it is also totally bounded. 

(MC3;) = (MC,). According to a, an open covering of a totally 
bounded space contains a countable covering {O,;} of the space. If no 
finite union of the O; covers the space, then, for every 7, there exists a 


point x, ¢ LU Oj, and, according to c, the sequence of these points con- 
j=1 


tains a mutually convergent subsequence. Therefore, when the totally 
bounded space is also complete, this sequence has a limit point « which 
necessarily belongs to some set O,, of the open countable covering of 
the space. Since x is a limit point of the sequence {¥,}, there exists 
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n 
some 7 > fo such that x, € O;, C U O;, and we reach a contradiction. 
j=1 


Thus, there exists a finite subcovering of the space. 
Corotiary 1. A compact metric space is bounded and separable. 


Coro.Luary 2. A continuous function X on a compact metric space 
Q, p) to a metric space (X, d) is uniformly continuous. 
»?P Pp 5) J 


By definition, X is uniformly continuous if for every ¢ > O there exists 
a 6 = 6(e) > 0, which depends only upon e¢, such that d(X(w), X(w’)) < ¢€ 
for p(w, w’) < 6. 

Proof. Let e>0. Since X is continuous, for every w € © there ex- 
ists a 6, such that d(X(w), X(w’)) < €/2 for p(w, w’) < 26,. Since the 
domain is compact, it is covered by a finite number of spheres /’,,,(6.,,), 
k = 1, 2, ---, ; let 6 be the smallest of their radii. Any w belongs to 
one of these spheres, say, /,,,(6..,), and if p(w, w’) < 6, then p(w,, w’) < 
26... It follows, by the triangle inequality, that 


d(X(w), X')) S a(X (wx), X(@)) + dX(r)s XW) <5 +5 = 
whenever p(w, w’) < 6, and the corollary is proved. 


Let us indicate how a noncomplete metric space (X, d) can be com- 
pleted, that is, can be put in a one-to-one isometric correspondence with 
a set in a complete metric space—in fact, with a set dense in the latter 
space. The elementary computations will be left to the reader. 

Consider all mutually convergent sequences 5 = (%1, ¥9, ---), 5’ = 
(x'1, x’2, «°+), °°°- The function p defined by p(s, 5’) = lim d(xn, *'n) 
exists and is finite and satisfies the triangular inequality. Let 5, s’ be 
equivalent if p(s, s’) = 0; this notion is symmetric, transitive, and re- 
flexive. It follows that the space (S, p) of all such equivalence classes 
is a metric space, and it 1s easily seen that it is complete. The one-to- 
one correspondence between & and the set S”’ of classes of equivalence 
of all “‘constant sequences,” defined by « <> (*, * ---), preserves the 
distances. Moreover, S’ is dense in S. Thus S may be considered as 
a “minimal completion” of X. 

Distance of sets. In what follows the sets under consideration are non- 
empty subsets of a metric space (X, d). The distance of two sets A and B 
is defined by 


d(A, B) = inf {d(x, y):x EC Ay € B} 
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and 
d(x, B) = d({x}, B) = inf{d@, y): 9 € B} 


is called the distance of « to B. Clearly there are sequences of points 
x, € Aand yn € B such that d(xn, yn) > d(4, B) and in particular 
d(x, Yn) > d(x, B). 


d. d(x, 4) is uniformly continuous in « and, in fact, 
| d(x, A) ~ dy, A) | < d(x, y). 


For, upon taking infima in z in the triangle inequality d(x, z) S d(x, y) + 
d(y, 2), we obtain d(x, 4) S d(x, y) + d(y, 4) and interchanging « and y 
the asserted inequality follows. 


D. (i) 4 = {x: d(x, 4) = 0}. 
(1) Lf disjoint sets A and B are closed then there are disjoint open sets 
U7 AandV D> B (Kis “normal’’) and there is a continuous function g 
withOsSgsl,g=O0o0n A,g =10n B (“Urysohn lemma’). 
(1) Lf a compact A and a closed B are disjoint then d(A, B) > 0. If 
moreover B is also compact then d(A, B) = d(x, y) for some « € A and 
yE RB. 


Proof. We use continuity in « of d(x, 4) without further comment. 
The set 4’ = {x: d(x, 4) = 0} contains 4 and is closed as inverse image 
of the closed singleton {0} under a continuous mapping. Let a sequence 
of points x, of 4 be such that d(x, *,) — d(x, 4). Then d(x, x,) — 0 for 
every x © A’sothatx € A hence 4’ is contained in 4. Thus (i) is proved. 


In (11), the “normality” assertion follows by (i) and continuity in x of 
d(x, 4) — d(x, B) upon taking U = {x: d(x, 4) — d(x, B) <0} DA 
and V = {x: d(x, 4) — d(x, B) > 0} D B. “Urysohn lemma” obtains 

d(x, A) 
d(x, A) + d(x, B)° 


For (ii), let sequences of points x, of 4 and y, of B be such that 
A(Xny Yn) > d(A, B). Since 4 is compact the sequence (#,) contains a 
subsequence %n»— * € A hence d(x, yn’) > d(A, B). If d(4, B) =0 
then yn" — x so that, B being closed, x € Band 4 and Bare not disjoint. 
Since they are disjoint, d(4, B) > 0. If, moreover, also B is compact 
then the sequence of points y, of B contains a subsequence yn" y € B 
hence d(x, y) = d(4, B). The proof is terminated. 

2.4 Linearity and normed spaces. Euclidean spaces are not only 
metric and complete but are also normed and linear as defined below. 
Unless specified, the “scalars’’ a, 4, c, with or without subscripts, are 


with g(x) = 


[SEc. 2] SETS, SPACES, AND MEASURES 79 


either arbitrary real numbers or arbitrary complex numbers, and x, y, 2, 
with or without subscripts, are arbitrary points in a space &. 

A space & is Jinear if a “‘linear operation” consisting of operations of 
“addition” and “multiplication by scalars” is defined on X to X with 
the properties: 


(1) ety=yte, et+0+90=6+y) +2 
xete=yfraxw=Y9; 
(11) lex =x, atxety) =ax+tay, (4+ d)x = ax + bx, 
a(bx) = (ab)x. 


By setting —y = —1-y, “subtraction” is defined by x — y = x + (—)). 
Elementary computations show that (i) and (ii) imply uniqueness of 
the “zero point” or ‘“‘null point” or “origin” 6, defined by 0 = 0-x, and 
with the property x + 6 = x. A set in a linear space generates a /inear 
subspace—the linear closure of the set—by adding to its points x, ¥, 
-++ fall points of the form ax + dy +--- #t. 

A metric linear space is a linear space with a metric d which 1s 1n- 
variant under translations and makes the linear operations continuous: 


(111) d(x, y) = d(x — y, 0), x, > 0 => ax, > 9, 
an »~ 0 => a,x — 6. 
If | 
(iv) d(x, y) = d(x — 9,9), d(ax, 0) =|a|d(x, 6), 


then (iii) holds, d(x, 6) is called norm of x and is denoted by | x I, and 
the metric linear space is then a “normed linear space.” 

Equivalently, a normed linear space is a linear space on which 1s de- 
fined a norm with values || x || = O such that 


(v) letyllsllell+|lylb Ill] =0 4 «=, 
| ox ff =e] -llell 
and the metric d is determined by the norm by setting 


d(x, y) = || x —y||- 


A Banach space is a normed linear space complete in the metric de- 
termined by the norm. For example, the space of all bounded continu- 
ous functions f on a topological space X to the euclidean line is a Banach 
space with a norm defined by | f|| = sup | f(x) |. Real spaces with 


0 
points « = (x, ++, «w) and norms Je || = (laa |? tee e+ | vey |7)1/7, 
y = 1, are Banach spaces, and we shall encounter similar but more gen- 
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eral spaces Z,. If r = 2, then these (euclidean) spaces are Hilbert 
spaces. | 

A Hilbert space is a Banach space whose norm has the parallelogram 
property: || « + y ||? + || ~—y I)? = 2\| x ||? + 2I| y ||?; such a norm 
determines a scalar product. It is simpler to determine the Hilbert 
norm by means of a scalar product (corresponding to the scalar prod- 


uct defined by (*, y) = > xx in a euclidean space R¥) as follows: 
k=1 


A scalar product is a function on the product of a linear space by it- 
self to its space of scalars, with values (x, y) such that 


(vi) (ax + by, z) = a(,z) + 4,2), (*, 9) = GU, *); 
x 0 = (*%,«) > 0. 


Clearly (*, «) is real and nonnegative. The function with values 
|| « || = (w, x)’4 = 0 is the Hilbert norm determined by the scalar prod- 
uct. For, obviously, it has the two last properties (v) of a norm. And 
it also has the first property (v). This follows by using in the expansion 
of (v + y, « + y) the Schwarz inequality 


I 9) | S [lela 


when (x, y) = 0 this inequality is trivially true, and when (x, y) #0 
it is obtained by expanding (« — ay, x — ay) 2 O and setting a = 
(x, «)/(y, *). Finally, the parallelogram property is immediate. 

Linear functionals. ‘The basic concept in the investigation of Banach 
spaces is the analogue of /(«) = cx—the simplest of nontrivial functions 
of classical analysis. A functional f on a normed linear space has for 
range space the space of the scalars (the scalars and the points below 
are arbitrary, unless specified). / is 


linear if flax + by) = af(x) + df(y); 

continuous if f(x%n) — f(*) as %, — *; if this property holds only 
for a particular x, then // is continuous at this x; 

normed or bounded if | f(x) | S ¢|| «|| where ¢ < @ is independent of 


ito | 


x; the zorm of f is then the finite number || f|| = 
ane | | 


For example, a scalar product (x, y) is a linear continuous and normed 
functional in x for every fixed y. Clearly, if f is linear, then /(@) = 0, 
and a linear functional continuous at 6 is continuous. 
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a. Let f be a linear functional on a normed linear space. If f is normed, 
then tt 1s continuous; and conversely. 


Proof. If f is normed, then it is continuous, since 
| f(%n) — f(*) | = | fGen — *) | S cll xn — «|| 20 as || xn — || - 0. 


If f is not normed, then it is not continuous, since whatever be x there 
exists a point x, such that | f(xn)| > || xn ||, and, setting yn, = x,/ 


n\| xn ||, we have | f(yn) | > 1 while || yn || = : — 0. 


b. The space of all normed linear functionals f on a normed linear space 
is a Banach space with norm || f ||. 


Proof. Clearly the space is normed and linear and it remains to 
prove that it is complete. 

Let || fn —fn|| > 0 as m, 2 > «. For every ¢ > 0 there exists 
an ”, such that \| Fon —fn | < «for m,n = n,; hence | fm) — fn (x) | < 
e|| «|| whatever be x. Since the space of scalars is complete, it follows 
that there exists a function f of x such that /,(«) — f(*) and, clearly, 
fis linear and normed. By letting m — , we have, form = ne, |f(x) — 
fr(~)| S ell x || whatever be x, that is, || f, —f|| S« Hence fr > f 
and the proposition is proved. 

What precedes applies word for word to more general functions (map- 
pings, transformations) on a normed linear space to a normed linear 
space with the same scalars, and the foregoing proposition remains valid, 
provided the range space is complete; it suffices to replace every | f(x) | 


by || 4(*) |]. 


The Banach space of normed linear functionals on a Banach space is 
said to be its adjoint; a Hilbert space is adjoint to itself. However, 
@ priori, the adjoint space may consist only of the trivial null functional 
f with || f|| =0. That it is not so will follow (see Corollary 1) from 
the basic Hahn-Banach 


A. EXTENSION THEOREM. If f ts a normed linear functional on a linear 
subspace A of a normed linear space, then f can be extended to a normed 
linear functional on the whole space without changing its norm. 


Proof. 1° We begin by showing that we can extend the domain of 
f point by point. Let 9 £ A and let || f|| = 1—this does not restrict 
the generality. First assume that the scalars, hence /, are real. 

The linearity condition determines f(« + axo), « € 4, by setting it 
equal to f(x) + af(xo), so that it suffices to show that there exists a 
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number /(xo) such that | f(x) + -af(xo) | < | x + ax | for every x € a 
and every number a. Since 4 is a linear subspace, we can replace « 
by ax and, by letting x vary, the condition becomes 


sup {—|] « + xo || —/()} Sf(mo) S inf {|| + xo || —f(x)}. 


Therefore, acceptable values of f(%o) exist if the above supremum 
is no greater than the above infimum, that is, if whatever be x’, x’ € 4 


—|| + xo || —/@) S || e+ x0 || — FO 
F(x!) — f(s’) S |L #” + xo || + |] + oo |]. 


Since by linearity of f and the triangle inequality 
Fe") = fe) = fe” = #') S [L#"” — «|| S [Le + aol + [Le + 0 Il, 


acceptable values of f(xo) exist. 

We can pass from real scalars to complex scalars, as follows: From 
f(ix) = if(*) it follows that f(+) = g(«) — ig(ix), «x C A, where g = @f 
is a real-valued linear functional with || ¢|| S 1; ¢ extends first for all 
points « + ax then for all points (« + axo) + d-ixo = x + (a + 16)x9, 
a, b real, and f extends by the foregoing relation. Now observe that / 
is linear on the so extended domain and that, for any given point x, 
Tel setting f(x) = re’*, r = 0, @ real, we obtain | I (*«) | = g(e x) < 

x | 

2° We can extend the domain of f point by point. The family of 
all possible extensions of f to linear functionals without change of norm 
is partially ordered by inclusion of their domains. Any linearly ordered 
subfamily of extensions has a supremum in the family—the extension 
on the union of the domains. According to a consequence of the axiom 
of choice (Zorn’s theorem), it follows that the whole family has a su- 
premum which is a member of the family. It must have for domain 
the whole space, for otherwise, by 1°, it could be extended further. 
The theorem is proved. 


or 


Coro.iary 1. Let x9 be a nonzero point of a normed linear space, 
and let A be a closed linear subspace. There exist linear functionals f, 
f’ on the space such that 


If |] = 1 and f(x) = || xo ||, 
f' =O0onA and f'(%) = d(x, 4) = inf d(x, x). 
zea 


Set f(axo) = al| xo Il, 4’ (axo + x) = ad(x%o, 4), x © A, and extend. 
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Corotiary 2. 4 functional f on a set A in a normed linear space ex- 
tends to a normed linear functional on the whole space with norm bounded 
by c(<0) if, and only if, 


| & anf(xn) | S ¢|| X cere |] 
k k 


whatever be the finite number of arbitrary points x, € A and of arbitrary 
scalars ay. 


Proof. The “only if” assertion is immediate. As for the “if” asser- 
tion, assume that the inequality is true, and observe that the linear 
closure of 4 consists of all points of the form « = > azx,. Linearity 

P 


of f on this closure implies that we must set f(x) = >> azf(*,). Then, 
k 


on the closure, | f(x) | < ¢l| «||, and f is uniquely determined, since, 
for « = >> apxp, = D> a’ yrx’n, we have 
P K 


| > anf(%E) — dS a’ gf (xr) | <c¢ | > ARK — >So a yxy || = 0. 
k ki! k ke 


The assertion follows by the extension theorem. 

This corollary permits us to solve various moment problems as well 
as to find conditions for existence of solutions of systems of linear equa- 
tions with an infinity of unknowns. 


§ 3. ADDITIVE SET FUNCTIONS 


3.1 Additivity and continuity. 4 set function ¢ is defined on a non- 
empty class © of sets in a space 2 by assigning to every set d€ @a 
single number (4), finite or infinite, the value of g at A. If all values 
of y are finite, y is said to be finite, and we write | o| <o. If every 
set in © 1s a countable union of sets in © at which ¢ is finite, ¢ is said to 
be o-finite. To avoid trivialities, we assume that every set function 
has at least one finite value. Unless otherwise stated, » denotes a set 
function and all sets considered are sets of the class on which this function 
is defined, so that the properties below are valid as long as 1s defined for 
the sets which appear there. 

y is said to be additive if 


o(2) 43) = D ¢(Aj) 


either for every countable or only for every finite class of disjoint 
sets. In the first case g is said to be countably additive or o-additive, 
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and in the second case ¢ is said to be finitely additive. In order that 
sums )) ¢(4;) be always meaningful we have to exclude the possibility 
of expressions of the form +« — «, In fact, if the sums always exist, 
y is defined on a field, and y(4) = + and 9(B) = —», then (2) = 
(4) + 9 4°) = +x” and o(Q) = 9(B) + o(B’) = —0, while the func- 
tion ¢ is single-valued. Thus, by definition, 


an additive set function has the additivity property above, and one of the 
values +0 or —o is not allowed. 


To fix ideas we assume that the value —oo is excluded, unless otherwise 
stated. | 

A nonnegative additive set function is called a content or a measure 
according as it is finitely additive or o-additive. Let o be additive. 
If 4 > B, then, by additivity, 


g(4) = o(B) + o(A4 — B). 
It follows, upon taking 4 = B+ § = B with ¢(B) finite, that o(6) = 0. 


A convergent series of terms, which are not necessarily of constant 
sign, may depend upon the order of the terms. This possibility is ex- 
cluded in our case by 


a. If y is o-additive and | o(X An) | < ©, then the series ¥ o(An) is 
absolutely convergent. 


Proof. Set An* = An or @ according as y(4,) =O or g(An) <0, 
and set 4,7 = A, or @ according as g(4,) $0 or o(4,) > 0. Then 


a> An*) = » g(An*), o> An) = > (An), 


and the terms of each series are of constant sign. Since the value —« 
is excluded, the last series converges. Since the sum of both series 
converges, so does the first series. The assertion follows. 


b. Lf g(A) is finite and AD B, then o(B) is finite; in particular, if 
g(Q) ts finite, then o is finite. If p = 0, then — is nondecreasing: y(A) = 
g(B) for A > B, and subadditive: e(U 4;) S X o(4;). 


Only the very last assertion needs verification and follows from 
AU 4) = (41 + Aye + Ay Ag’ Ag +--+) 

(41) + o(4y°A2) + 9(41°40°A3) H::: 

g(41) + o(42) + o(43) +°->. 


IIA 
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We intend to show that the difference between finite additivity and 
o-additivity lies in continuity properties. g is said to be continuous 
from below or from above according as 


g(lim 4,) = lim g(4,) 


for every sequence 4, 7 , or for every sequence 4, | such that 9(4,) 1s 
finite for some value 7 of 2 (hence, by b, for all x 2 mo). If ¢ is con- 
tinuous from above and from below, it is said to be continuous. Con- 
tinuity might hold at a fixed set 74 only, that is, for all monotone se- 
quences which converge to 4; continuity at @ reduces to continuity 


from above at @. 


A. CONTINUITY THEOREM FOR ADDITIVE SET FUNCTIONS. 4 o-additive 
set function ts finitely additive and continuous. Conversely, tf a set func- 
tion is finitely additive and, either continuous from below, or finite and 
continuous at Q, then the set function 1s o-additive. 


Proof. Let ¢ be o-additive and, a@ fortiori, additive. g is continuous 
from below, for, if 4, 7, then 


lim 4n = U An = Ay + (4a — Ai) + (43 — Ae) +°°: 
so that 
g(lim An) = lim {¢(41) + o(42 — 41) +:--+ o(An — An-1)} 
= lim g(An). 


y is continuous from above, for, if 4, |] and ¢(4,,) is finite, then 
An, — An} for n 2 mo, the foregoing result for continuity from below 
applies and, hence, 


o(4,,.) — o(lim 4,) 


p(lim (4y, — An) = lim (Ang — An) 
e(Any) — lim o(An) 


or 


v(lim 4p) = lim o(Ap). 


Conversely, let ¢ be finitely additive. If y is continuous from below, 
then 


n 


e(d, An) = elim 2, 4x) = lim ¢( 2) Az) = lim 27 e(A) = 2 (An); 
=] — 


k=1 k=1 


so that 9 is o-additive. If ¢ is finite and continuous at @, then o-addi- 
tivity follows from 
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o(>> An) = of x An) + ot > Ar) = = o(Ar) + o( > Ax) 
=n-+1 k=n+1 
and 


o( > Ay) — ¢(%) = 9. 


k=n+1 
The proof is complete. 


The continuity properties of a o-additive set function y acquire their 
full significance when ¢ is defined on a o-field. Then, not only is ¢ de- 
fined for all countable sums and monotone limits of sets of the o-field 
but, moreover, ¢ attains its extrema at some sets of this o-field. More 
precisely 


c. If 9 on a o-field @ is c-additive, then there exist sets C and D of @ 
such that o(C) = sup ¢ and ¢g(D) = inf-¢. 


Proof. We prove the existence of C; the proof of the existence of D 
is similar. If o(4) = +o for some 4 € @, then we can set C= 4 
and the theorem is trivially true. Thus, let g < ©, so that, since the 
value —© is excluded, ¢ is finite. 

There exists a sequence {4,} C @ such that g(4,) — supg. Let 
A = VU An and, for every m, consider the partition of 4 into 2” sets 

nr 


Anm of the form () 4’, where 4’, = 4, or 4 — Ay; for n <n’, every 
k=1 


Anm 1s a finite sum of sets 4, Let B, be the sum of all those 4, 
for which ¢ is nonnegative; if there are none, set B, = 0. Since, on the 
one hand, 4, is the sum of some of the 4_m and, on the other hand, for 
n' > n, every An m is either in B, or disjoint from By, we have 


(An) S o(Bn) S o(Ba U Bay, U---U B,,). 


Letting 2’ — o, it follows, by continuity from below, that 


lA 


(An) S o(Bn) S o( uU By). 


Letting now » — © and setting C = lim UJ B;,, it follows, by con- 
k= 


tinuity from above (¢ is finite), that sup ¢ S ¢(C). But ¢g(C) S supe 
and, thus, g(C) = sup gy. The proof is complete. 


Corotiary. If » on a o-field @ is c-additive (and the value —© is 
excluded), then ~ is bounded below. 
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3.2 Decomposition of additive set functions. We shall find later that 
the ‘‘natural”’ domains of o-additive set functions are o-fields. We in- 
tend to show that on such domains g-additive set functions coincide 
with signed measures, that is, differences of two measures of which one 
at least is finite. Clearly, a signed measure is o-additive so that we 
need only to prove the converse. 

Let y be an additive function on a field @ and define gt and y7 on 
© by 

y'(4) = sup¢(B), ¢ (4) = —infy(B), 4,B Ee. 
BcaA BcaA 


The set functions y*, » and ¢@ = gt + @ are called the upper, lower, 
and total variation of g on ©, respectively. Since g(@) = 0, these varia- 
tions are nonnegative. 


A. JoRDAN-HAHN DECOMPOSITION THEOREM. If og on a afield @ ts 
o-additive, then there exists a set D such that, for every A € @, 
—~ (4) = (AD), 9*(A) = o( AD). 
y* and o~ are measures and @ = ot — g~ isa signed measure. 


Proof. According to 3.lc, there exists a set D € @ such that ¢(D) 
= inf g; since the value —o is excluded, we have 


—«o < 9(D) = infg <0. 


For every set 4 € @, o(4D) $0 and g(AD*) 2 0, since ¢ 2 ¢(D) 
while, if ¢(4D) > 0, then 


g(D — AD) = ¢(D) — ¢(AD) < ¢(D), 
and if g(AD*) <0, then 
g(D + AD‘) = g(D) + ¢{AD") < ¢(D). 
It follows that, for every B C A, (4, BE @), 
g(B) S o(BD’) S o( BD’) + o((4 — B)D’) = o( AD), 


and, hence, y* (4) S y(AD*). Since AD* is one of the B’s, the reverse 
inequality is also true. Therefore, for every 4 € @, o*(4) = o(AD) 
and, similarly, —~ (4) = ¢(AD), so that 


y(4) = o(AD*) + o(AD) = ot (4) — ¢ (A). 
Moreover, y* on @ is a measure since g* = 0 and 


et (> 4) = 0X 4D) = Y e(4,D*) = J ot (4)). 
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Similarly g~ on @ is a measure and, furthermore, it is bounded by 
—y(D) which is finite. Thus, ¢ = ¢t — gy” is a signed measure, and 
the proof is complete. 


Jorpan peEcomposiTion. Jf @ is only a field but o is also bounded, 
then it is still a signed measure. Prove, proceeding directly from the 
definitions, showing first that y= are bounded measures. 


*$ 4. CONSTRUCTION OF MEASURES ON g-FIELDS 


4.1 Extension of measures. If two set functions y on © and yg’ on 
@’ take the same values at sets of a common subclass @”, we say that 
gy and 9’ agree or coincide on C”’. If © C C’ and ¢ and ¢’ agree on @, 
we say that @ is a restriction of y’ on @, and g’ is an extension of » on 
@’. The general extension problem can be stated as follows: find ex- 
tensions of yg which preserve some specified properties. If, given @’ D @, 
there is one, and only one, such extension on @’, we say that this ex- 
tension is determined. 

Here, we are concerned with the extension of measures to measures 
and shall denote extensions and restrictions of a measure u by the same 
letter; as long as their domains are specified, there is no confusion pos- 
sible. While any restriction of a measure is determined and is a meas- 
ure, an extension of a measure to a measure on a given class may not ex- 
ist, and if one exists it may not be unique. Our aim is to produce classes 
on which such extensions exist, and cases where they are determined. 
The results of the investigation are summarized by the Carathéodory 


A. EXTENSION THEOREM. 4 measure p on a field @ can be extended 
to a measure on the minimal o-field over ©. If, moreover, pw is o-finite, 
then the extension 1s determined and 1s o-finite. 


We prove the extension theorem by means of an intermediate weaker 
extension which preserves a part only of the properties characterizing 
a measure. We shall need various notions that we collect here. 

A set function u° on the class S(Q) of all sets in the space Q is called 
an outer measure if it is sub o-additive, nondecreasing, and takes the 
value 0 at 9: 


e’(U 4;) S > u°(4,) for every countable class {4;}, 
u’(4) S w°(B) for ACB, p(B) = 0. 
A set 4 1s called y°-measurable if, for every set D C Q, 
u°(D) 2 u°(AD) + u°(A°D). 
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Since the relation is always true when u°(D) = o, it suffices to consider 
sets D with p°(D) < ©. Since yp? is sub o-additive, the reverse inequality 
is always true and, hence, 7 is u°-measurable if, and only if, 


u°(D) = w°(AD) + u°(A°D). 


The class of all u°-measurable sets will be denoted by @° and, clearly, 
contains §@ and Q. The outer extension of a measure yp given on a field 
@ is defined for all sets 7 C Q by 


u°(A) = inf 2) (Aj), 


where the infimum is taken over all countable classes {4;} C © such 
that 4 C U 4;—coverings in © of A, for short. Since Q € @, there is 
at least one covering (consisting of 2) in © of every 4 so that the defi- 
nition of an outer extension is justified. The use of the same symbol 
uw both for an outer measure and an outer extension is due to the prop- 
erty, to be proved first, that the outer extension of the measure p on C@ 
is an extension of » to an outer measure. Next we shall prove that the 
restriction to @° of w° is a measure and that @? is a o-field, and the 
extension theorem will follow. 


a. The outer extension p° of a measure won a field © is an extension of 
to an outer measure. 


Proof. We prove first that y° is an extension of u. 

If 7 € @, then n°(4) S w(4). On the other hand, since p is a meas- 
ure, u(4) S >> u(4;) for every covering {4;} in © of 4, so that n(4) 
< y°(4) and, hence, p°(4) = u(4) for 4€ ©. It remains to prove 
that p° is an outer measure. 

To begin with, »°(@) = 0 since @ € @. Furthermore, y°(4) S u°(B) 
for 7 C B, since every covering in € of B is also a covering of 4. Finally, 
we prove that p° is sub o-additive. 

Let e > 0 and let {4;} be an arbitrary countable class. For every 
A; there is a covering {4;,} in © such that 


é€ 
De win) S (Ai) + 
k 2 
Since U 4; C U 4x, it follows that 
7 ik 
w(U 43) S Do v(Aie) S Do v°(4j) + 6 
j jik j 


and, e > 0 being arbitrarily close to zero, sub o-additivity is proved. 
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b. Jf u° is an outer measure, then the class Q° of w°-measurable sets is a 
o-field and p° on Q° is a measure. 


Proof. We prove first that @’ is a field and yp’ on @? 1s a content. 
If 4 € @’, then 4° € @’, since the definition of w°-measurability 1s 
symmetric in 4 and 4. If 4, BC @’, then AB € @’, since 
we(D) = w°(AD) + v°(A°D) 
u?(ABD) + u°(ABD) + w°(4°BD) + p?(4°B°D) 
= u°(ABD) + w°(ABSD U A-BD U A°B*D) 
= p°(ABD) + p°(AB)°D. 


Thus @’ is closed under complementations and finite intersections and, 
hence, under finite unions, so that @? is a field. 
u° is finitely additive on @?° since, if 4, B © @° and are disjoint, 
uA + B) = w((4 + BA) + w'((A + BA) = (A) + u°(B). 


Since u°(4) = u°(G) = 0, uv’ on @ is a content. 
To complete the proof, it suffices to show that, if the 4, € @° are 
disjoint, then 74 = >> 4, € @° and p°(4) = > p°(4,). 


Since B, = >> Ay € @°, we have 
k=1 
wD) = w°(B,D) + w°(BrD) = DY u°(4,D) + v°(A°D) 
k=1 
and, letting 2 > , 
u°(D) 2 de w(AnD) + w°(A°D) = u°(AD) + (AD). 


The inequality between the extreme sides shows that 4 €@°. The 
first inequality with D replaced by 4 becomes 


w°(A) = Dd) w’(An) 


while the reverse inequality is always true. 


Thus 
p°(A) a > B(An)s 
and the proof is complete. 

Remark. Most frequently, a measure yu is given on a class D whose 
closure under finite summations or under countable summations is a 
field ©. Then the requirement of o-additivity determines the unique 
extension of » on @. 


We are now in a position to prove the extension theorem. 
1° For every 4 € @ and every D there is, for every e > 0, a covering 
{4;} in @ of D such that 


u(D) + ¢ 2 dua) = 2 u(4A4j) + Do u(4°4;) 2 u°(AD) + v°(A°D). 


Thus, 4 € @° and, hence, since the field © is contained in the o-field 
@°, the minimal o-field @ over @ is contained in @°. It follows, according 
to a and b, that the contraction on @ of the measure py? on @? Is an ex- 
tension of uy to a measure on @. This proves the first part of the theorem. 

2° Let pon © be finite, let un; and ye be two extensions of » to meas- 
ures on @, and let 9% C @ be the class on which p, and pe agree. Since 
Q belongs to ©, m1 (2) = pe(Q) = w(Q) < © ; hence mi and pe, are finite. 
Since 9M contains © and, for every monotone sequence 4, € OM, 


(lim An) = lim 4y(4n) = lim pe(An) = we(lim A), 


9 is amonotone class. It follows, by 1.6A, that 91 contains the mini- 
mal o-field @ over the field © and, therefore, u; and we agree on @. 

Let now p on @ be o-finite so that there is a countable class {4;} C € 
with uJ; finite which covers Q. Thus, the foregoing result applies to 
every subspace 4;, and the second part of the theorem follows. 

Generalization. The extension theorem is valid for o-finite signed meas- 
ures g = p’ — pw”. Extend yp’ and yw” and observe that 2° applies with 
y instead of pu. 

Completion. Given a measure p on a o-field Q, it 1s always possible 
to extend uw to a larger o-field obtained as follows: For every 4€C @ 
and an arbitrary subset NV of a null set of @, that is, a set of measure 
zero, set u(d U N) = n(4). Clearly, the class of all sets 7 U Nis a 
o-field @, D> @ and yp on @, 1s an extension of w to a measure on Q@,. 
@, is called the completion of @ for uw and p on @, is called a complete 
measure. It is easily seen that @, C @°, so that the extension theorem 
provides us automatically with extensions to complete measures. 

4.2 Product probabilities. A measure on a class containing the space 
is called a normed measure or a probability when its value for the whole 
space is one; we reserve the symbol P, with or without affixes, for such 
measures. 

Let (Q;, @:, P:), ¢€ T, be probability spaces, that is, triplets consist- 
ing of a space Q of points w,, a o-field @,; of measurable sets 4; (with or 
without superscripts) in Q,, and a probability P; on @;. Let Cr be the 


class of all measurable cylinders of the form [J 4; X JJ Q in the 
ten teET—-TyN 


product measurable space (JJ 2, [J] @:). The class @r of all finite 
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sums of these cylinders is a field, and the minimal o-field @r over @r 
is, by definition, the product o-field J[ @;. The product probability 
Pr = J] P; on the class @r is defined by assigning to every interval 
cylinder the product of the probabilities of its sides: in symbols, 


P(TL_aX [IL ®= [Pa TL P&= IT Pri. 
teT-—Ty ten teET-TyN t€Tn 


t€Tn 


Clearly, PrQr7 = 1 and Pr on @p is finitely additive and determines 
its extension to a finitely additive set function Pr on @r. The defining 
term “‘product-probability” is justified by the following theorem (An- 
dersen and Jessen). 


A. PRopuUCT PROBABILITY THEOREM. The product probability Pr on 
Qr is o-additive and determines its extension to a probability Pr on the 
product o-field Qr. 


Thus, the triplet (Q7, @r, Pr) is a probability space, to be called the 
product probability space. 

Proof. 1° On account of the extension theorem, it suffices to prove 
that Pr on @r is o-additive. Since it is obviously finitely additive on 
®r, on account of the continuity theorem for additive set functions it 
suffices to prove that Pr on Br is continuous at §. 4b contrario, given 
e > 0 arbitrarily close to 0, it suffices to prove that, for every nonin- 
creasing sequence of measurable cylinders 4” | 4 with Prd” > e for 
every 7, the limit set 47 is not empty. Since every cylinder 4” depends 
only upon a finite subset of indices, the set of all indices involved in 
defining the sequence 4” is countable. By interchanging, if necessary, 
the indices, we can restrict ourselves to the product space 2 = JT Q, 
and sets 4* = D* X Q with D?CQ, X + + + KQa, Qn = Ong KngeX +++ 

If the set of all indices is finite, then there is an integer N such that, 
for every 7, all the factors which follow the Nth one reduce to Qy, and 
the argument below applies with corresponding modifications. 

2° Let P’;, P's, --- be the set functions defined on the fields @y’, 
®’o, «++ of all measurable cylinders in 0’), Q's, -+-, as Pr is defined on 
Bp. Let 4"(w,), 4"(w1, we), >: + be the sections of 4” at w, € 1, (1, 
wo) & 2,X Qe, etc. Clearly, 4"(w1) € By. It is easily seen that, if 
B,” is the set of all w; such that 


P'\A"(w) > ’ 
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then 
PB," + 5 (1 — P,B,") = Prd" >. 


and, hence, 


E€ 

P,By" >=: 

1D 5 
Since 4” | implies that By" |, it follows that, for B, = lim By", PB, 
= 5 . Thus, By; is not empty; hence, there is a point @, € 2; common 


to all By” and, for every 7, P’;(A"(a)) > 5 . The same argument ap- 


€ 
2 5) 
and so on. Therefore, the point @ = (a, #2, --:) is common to all 
A”, so that the limit set 74 is not empty, and the proof is complete. 

We pass now to Borel spaces. 

4.3 Consistent probabilities on Borel fields. We introduce the fol- 
lowing terminology. The set R = (—, +) of all finite numbers x 
is a real line, the minimal o-field over the class of all intervals is the 
Borel field @ in R, the elements of @ are Borel sets in R, and the measur- 
able space (R, ®) is a Borel line. Similarly, the product space Rr = 
II R:,; where every R; is a real line with points x,, is a real space with 
points xr = (%;), the product o-field @®r = [J] @,, where every @; is 
the Borel field in R;, is the Borel field in Rr whose elements are Borel 
sets in Rp, and the measurable space (Rr, @r) is a Borel space. If T 
is a finite set, we say that Rr is a finite product space. Cylinders with 
Borel bases are Borel cylinders and, clearly, the Borel field @r is the 
minimal o-field over the class of all Borel cylinders or, equivalently, 
over the class of all cylinders whose bases are product Borel sets. 

Given a finite measure on @r we can assume, by dividing it by its 
value for Ry, that it is a probability Pr. Let Ty = {4, --- tn} bea 
finite subset of indices and let (Rry, ®r,) be the corresponding Borel 
space. We define on @7, the marginal probability Pr,, or projection 
of P on Rey, by assigning to every Borel set Br, in Rr, the measure 
of the cylinder with basis Br,; in symbols 


Pry(Bry) = Pr(Bry X R'ry), R' ty = UM, Ry. 
N 


plied to A" (0) J yields a point We E Qe such that P',(A"(@1, G9)) > 


Marginal probabilities are consistent in the following sense. If R’ and 
R” are two finite product subspaces of Rr, with marginal measures P’ 
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and P’’, respectively, then the projections of P’ and P” on their com- 
mon subspace, if any, coincide (with the projection of Pr on this sub- 
space). We want to prove that the converse is true (Daniell, Kol- 
mogorov). 


A. CONSISTENCY THEOREM. Consistent probabilities Pp, on Borel 
fields of all finite product subspaces Ry, of Rr determine a probability Pr 
on the Borel field in Rr such that every Pr, is the projection of Pr on Rry. 


Proof. To every Borel cylinder with Borel base Bry in Rey we as- 
sign the probability value 


Pr(Bry X R'ry) = Pry(Bry). 


It is easily seen that Pr on the class @r of all Borel cylinders is finitely 
additive, and the theorem will follow from the extension theorem if we 
prove that Pr on @r is continuous at 9. 

As in the proof of the product probability theorem, it suffices to 
prove that, given e > 0 arbitrarily close to zero, if a sequence 7, | 4 
of Borel cylinders with bases B, formed by finite sums of intervals in 
Ri X-++X Ry is such that, for every 7, 


Pr(An) = Pyg---n(Bn) > €; 


then 4 is not empty. To simplify the writing, set P = Pp and P, = 
Py9...n. Since P, is bounded and continuous from below, in every in- 
terval in R,; X-+-X R, we can find a bounded closed interval whose 
P,-measure is as close as we wish to that of the original interval. There- 
fore, in every Bn, we can find a bounded closed Borel set B’,—formed 
by a finite sum of boufided closed intervals—such that P,(B, — B’n) 


< sai and, hence, if 4’ is the Borel cylinder with basis B’,, then 


€ 
P(An — A'n) = Pn(Bn — B'n) < peti 


It follows, setting C, = 4’; N---f A’, that P(A, — Ch) < 5 0, since 
C, CM’, © An; 
€ 


P(C,) > P(An) — ; >; 


Thus every C,, is nonempty and we can select in it a point w™ = (4, 
xo, +++). It follows from Cy DC. D--- that for every p = 0, 1, 
, e™@tP) € C, C A’, and hence (1%T?), +++, xn'"t?) © By. 


[SEc. 4] SETS, SPACES, AND MEASURES 95 


Since every B’, is bounded, we can select a subsequence 7; of inte- 

gers such that x” > x, as kk o, then within it a subsequence 

no, such that x9? —> x9, and so on. The diagonal subsequence of 

points #%) = («1 x9, ...) converges to the point x = (x1, %2, 

-+-) and (9%, +++) Hm) — (41, +++, ¥m) © Bm for every m. 
20 


Therefore, « € 4'm C Am whatever be m so that x € () 4m. Thus 
m=1 


this intersection is not empty, and the assertion is proved. 

Extensions. The foregoing theorem can be extended, as follows: 
Let @, be the o-field of Borel cylinders with bases in Ry X---X Ra, 
and let @., be the Borel field in [[ Rn. 


1° If uniformly bounded measures pn On Qn form a nondecreasing 
sequence, in the sense that tn4n S pn4idn S++: and hence ppAn T pdAn 
as p — © whatever be n and An € Qn, then p extends to a bounded meas- 
ure ON Ay. 


The proof reduces to the previous one as follows. The set function p 
so defined on the field LU @, of all Borel cylinders in [[ Rp, is, clearly, 
finitely additive and bounded. Therefore, it suffices to prove that on 
this field » is continuous at @. Given e > 0 and 4, € Gn, we can find 


p sufficiently large so that up4, + a > ud,. Then we can select 
a Borel cylinder 4’, C 4, whose basis is a closed and bounded Borel 


set in Ry X-+-++X Ry such that py(4,n — A’'n) < reo) . It follows that 


€ € 
Bln + apn = edn + a > Han 


3 : 
so that u(4, — A’/n) < ae From here on, the end of the preceding 


proof applies word for word. 


If gy On Gn, nm = 1, 2, «++, are such that gp(4n) = Gn4i(4n) =°°>; 
An € Qn, we say that the gy, are consistent. 


2° If the uniformly bounded o-additive set functions on on Qn are con- 
sistent, hence op(An) — ¢(An) as p — © whatever be n and An © Qn; 
then » extends to a a-additive bounded set function on Qo. 


The assertion follows from what precedes. For, clearly, the total varia- 
tions g, on @, form a nondecreasing bounded sequence on [LJ Ga, in 
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the sense of 1°. Hence lim ¢, is continuous at §on U @, and, a fortiori, 
so is g. Now use Jordan decomposition and generalization in 4.1. 

4.4 Lebesgue-Stieltjes measures and distribution functions. Com- 
plete measures on the Borel field in a real line R = (—o%, +) did, and 
still do, play a prominent role. However, being set functions, they are 
not easy to handle with the tools of classical analysis, for methods of 
analysis were developed to deal primarily with finite point functions on 
R. It is, therefore, of the greatest methodological importance to es- 
tablish a link between the modern notion of measure and the classical 
notions. This will be done by showing that there is a class of point 
functions on R which can be placed in a one-to-one correspondence with 
a very wide class of measures. In this manner, investigations of meas- 
ures (and, thereafter, of integrals) will be reduced to investigations of 
the corresponding point functions and, thus, the familiar methods of 
analysis will apply. Whatever be these point functions they will be 
said to represent the corresponding measure. 

Among possible representations of measures there are two which are 
fundamental: “distribution functions’? which represent measures as- 
signing finite values to finite intervals, to be called Lebesgue-Stieltjes 
(L.S.) measures, that we shall introduce now, and “characteristic func- 
tions” which represent the subclass of finite Lebesgue-Stieltjes measures 
required in connection with probability problems—that we shall in- 
troduce in Part IJ. Let @ be the Borel field in R and let up be a Lebesgue- 
Stieltjes measure. The completion of @ for » will be denoted by @&,, 
and called a Lebesgue-Stieltjes field in R, and its elements will be called 
Tebesgue-Stieltjes sets in R. 

A function on R which is finite, nondecreasing, and continuous from 
the left is called a distribution function (d.f.). Two d.f.’s will be said 
to be equivalent if they differ by some fixed but arbitrary constant. 
This notion of equivalence has the usual properties of equivalence—it 
is reflexive, transitive, and symmetric. Thus, the class of all d.f.’s 
splits into equivalence classes. As the correspondence theorem below 
(Lebesgue, Radon) shows, the one-to-one correspondence between L.S.- 
measures and d.f.’s is not a correspondence between L.S.-measures and 
individual d.f.’s but a correspondence between L.S.-measures and classes 
of equivalent d.f.’s, each class to be represented by one of its elements, 
arbitrarily chosen. 

Let F, with or without affixes, denote a d.f. and define its increment 
function by 


Fla, ) = F(é) — F(a), —3~ <a<b< +o, 


[SEc. 4] SETS, SPACES, AND MEASURES 97 


Since two equivalent d.f.’s have the same increment function and con- 
versely, it follows that every class of equivalent d.f.’s 1s characterized 
by its increment function. Moreover, the defining properties of d.f.’s 
are equivalent to the following: 


(i) OS Fla, db) < @, (ii) Fla, b) > Oasa T B, 


and 
n—l 


r() 
(it) DD Flax, be) + DL Fildes @r4i) = Fai, dn) 
k=l k=l 
where a < 4, a4, Sb; S ag S++: Ss Gy S Dy are arbitrary. 


A. CoRRESPONDENCE THEOREM. The relation 
pla, 5).= Fla, d), -w~<asdb<+o0 


establishes a one-to-one correspondence between L.S.-measures pand df.s F 
defined up to an equivalence. 


Proof. Wet @y be the class of all intervals [a, 4), -»7 <a<b< +. 
@; is closed under formation of finite intersections. The minimal field 
@p over Bz is the class of all finite sums of elements of ®; and of intervals 
of the form (—, a), [4+ ©), and the minimal o-field over @o is the 
Borel field &. 

The proof of the correspondence theorem is summarized by the dia- 
gram below, where ¢ represents an arbitrary constant: 


F+conR © pon @ & pon Bo & vpon B & pon ®B,. 


1° pon B, = F+ec on R. For, uw on @, determines its restric- 
tion to @y and, from properties of L.S.-measures it follows that the 
relation 


F'[a, 6) = pla, 2) 
determines an increment function with properties (i), (11), and (iil) 


given above. 


2° won Bo => pon B,. For, R being a denumerable sum of finite 
intervals, the measure w on @po is o-finite and the extension theorem 
applies followed by completion. 


3° np on @r =p on Bo. It suffices to prove that if 7 = >) Le 
k 


€ Bo, 1; € Bz, then p(/4) is determined by the o-additivity requirement 
u(A4) = >> w(;), that is, if 4 can also be written as >) J’;, where 1’; 
k j 
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€ @®y, then >> u(J’;) = >> uz). Since uw on By is additive and 
j k 


I; = Al; = CGM; Th = Aly = X's 
k j 


it follows that 


neW’;) =v 2, wel’) =) dw’ k) = > uly), 
j jk kj i 


and the assertion is proved. 


4° F+¢=>4u 0n By. We have to prove that the relation pla, 6) = 
Fa, 6) determines a measure uw on @, that is, if J = >> J,, where J = 
[a, 6) and In = [an, bn), then pl = Do wl,. By interchanging, if neces- 
sary, the subscripts, we can assume that, for every 7, 


It follows that 
n—1 


2, MU) = do Flay de) S x Plax, ox) + 2 Flbky 4x41) 
k=1 k=1 


= Flay, Dn) s ia, i = wl), 


and, letting n — o, we get > un) S w(D). 

It remains for us to prove the reverse inequality. We exclude the 
trivial case a = J, select e€ > 0 such that e < 6—a and set J* = [a, 
6 — e]. Because of the continuity from the left, for every 7 there is an 
én > 0 such that Flan, — €n, an) < = x . If 2,5 = (an — én, bn), then, 
from J* Cc U J, § it follows, by the Heine-Borel lemma, that there is an 


mo finite such that J¢C U Iy§. Let ky Sm be such that 4 € J, 


and, if 4,, < 4, then let Le < < mo be such that &, € J,,. Continue in 
this manner until some 4;,, 2 4 — e—the process necessarily stops for 


some m S mp. Omitting intervals that were not selected and, if neces- 
m 


sary, changing the subscripts, it follows that J¢ Cc (U J;£ and 
k=l 


a,;— & <ac< bi, GAk4+1 — €k41 < br < bps 


for 
k=1,2,-*>-m—1, dm — &m <O—€ SS dy. 


(Src. 4] SETS, SPACES, AND MEASURES 99 


Therefore, 
m—1 


Fla, b— €) s Flay — €]5 bm) = Fla; — €15 b) + Dd Fb, bx4.1) 
SD) Flan — ey Ox) S dD Filan, On) + € 
k=1 k=1 


and, letting « — 0, 


Fla, 6) S 2 Flan; dn), that is, eZ) S Le wUn)s 


which completes the proof of the final assertion and, hence, of the cor- 
respondence theorem. 

Particular case. If F is defined, up to an additive constant, by 
F(x) = «x, x © R, then the corresponding measure of an interval is its 
“length.” ‘The extension of “length” to a measure uw on @ and the 
completed measure wp on @®, are called Lebesgue measure on @® or Gy; 
respectively, and @, will be called Lebesgue field. The Lebesgue meas- 
ure is at the root of the general notion of measure. 

Remark. We can define a L.S.-measure on the Borel field @-mini- 
mal o-field over the class of all intervals in R = [—~, -+«] and, hence, 
on ®,, by adjoining to a L.S.-measure on ®, arbitrary measures for the 
sets {—o} and {+o}. 

Extension. The preceding definitions, proofs, and results, remain 
valid, word for word, if Borel lines are replaced by finite-dimensional 
Borel spaces RY = R, X-+-X Ry, provided the following interpreta- 
tion of symbols is used: a, 4, x, «++ are points in R¥, say, a = (a, «°°; 
an); 4 < b(a S b) means that a, < dy (a, S Oy) for R= 1,---, Nw. F 
on R® is a function with values F(z) = F(a,, --:, ay) and increments 


Fla, 6) are defined by 
fla, b) = Ap—al’(a@) = Ab, —a oes Aby—ay F(a, a2, °°° an) 


where, for every k, Ay,—a, denotes the difference operator of step 2, — ay 
acting on a,. For instance, if N = 2, 


Ap—aF’(@) = Ay,—a,Ab.—azl" (41) 42) = Av,—a,{F(41, 42) — F(a1, 42) } 
= F'(d1, be) — F(a, 42) — F(A, a2) + F(a, 42) 


and, in particular, if F'(a,, a2) = a 42 is the area of the rectangle with 
sides 0 to a; and O to de, then Apy_pf'(a) = (41 — @1)(b2 — ae) is the 
area of the rectangle with sides a, to 4; and dg to do. 
The defining properties of a d.f. F on R® become: 
—x< F<-+o0, Fla, b) = Ay F(a) 2 0, Fla, b) — 0 


asa 7 4, that is, a4, fT d,-+-, an fT dy. 
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Product-d_f.’s and product-measures. Avery important particular case 
is that of product-df.’s: 


N 
F(x, mr ty xy) = II F:@), *, E Re 


k=l 
where the F;, on R; are d.f.’s. Then Fon R® is a d.f., for, 


N 
A,—aF (a) = [] As,—a.Fe(ax) 2 0 
k=l 


and the other defining properties are clearly satisfied. 

Every d.f. F, determines a measure yz on the Borel field in R,, by 
means of the relation yz[a,, 4.) = Fi[a,, 0,), and the measure u on the 
product Borel field determined by means of the relation u[a, 6) = Fla; 4), 

N 


is clearly the product-measure [J ux. 


k=1 
Let now F,, be d.f.’s with F,(+0) — F,(—«) = 1, so that the meas- 
ures uw, are probabilities. Then, by the product-probability theorem 
or by the consistency theorem, 


B. A sequence F,, of a.f.’s corresponding to probabilities on Ry deter- 
mines a product-probability on the Borel field in the product space [J Rn. 


This result extends at once to any set {F;, ¢ © T}, of such d.f.’s. 


COMPLEMENTS AND DETAILS 


In one guise or another, and especially when they are indefinite integrals, 
signed measures on a fixed o-field are in constant use in measure theory and 
probability theory. Many of the properties established in this book are but 
properties of such set functions. 

Notation. The measurable sets belong to a fixed o-field on which the set 
functions and limits of their sequences are defined. Unless otherwise stated 
and with or without affixes, 4, B, --- denote sets, u denotes a measure, y de- 
notes a signed measure. 

1. If g is o-finite, then there are only countably many disjoint sets for which 
gy * 0 in every class. 

2. For every 4 there exists a B C A such that (4) S 2| o(B) |. 

3. If 1 S go, then git S got, vi 2 ge. If gp = gi + Gg, then vt? S 
git + got. 

4. Minimality of the Jordan-Hahn decomposition. If g¢ = y+ —y-, then 
gt Sp. 

We say that 4 is a g-null set, if ep = 0 on {44’, A'C Q}. We say that 4 
and B are y-equivalent, if they coincide up to a g-null set. We say that a non- 
empty set is a y-atom, if every measurable subset of 4 is y-equivalent either 
to 9 or to 4. 
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5. The g-null sets form a o-ring; the g-null sets of g and of @ are the same. 

The y-equivalence is an equivalence relation (reflexive, transitive, and sym- 
metric), and @ splits into g-equivalence classes. 

6. Every gy-null set and every measurable set consisting of one point is a 
g-atom; @(4) = | v(4) | for every g-atom 4. Atoms of y and ¢ are the same; 
atoms of ¢ are atoms of yt and g~, but the converse is not necessarily true. 

If 4 is a y-atom, then yg = Oor g(A) on A 11 G; if vis finite, then the converse 
is true. What if g is o-finite? What about g = © except for @? 

7. If wis finite, then Q = >> 4; + A where the 4; or 4 may be absent but, 
if present, then the 4; are u-atoms of positive measure and, for every B C 4 
of positive measure, u takes every value c between 0 and wB for measurable sub- 
sets of B. This decomposition of 2 is determined up to u-null sets. Can p be 
replaced by y? 

(There is only a countable number of u-equivalence classes of such 4;’s. 
Select representatives 4; of these classes and let BC 4 =Q— >) Aj. Select 


inductively sets C, €. ©, such that uC, > sup pC — ; for all C € @,, where 


@, is the class of all CC B—(C, UG --- U Ca_s) for which uC Sc — 
u(C, U Co U---U Cy_3). Then uC = ¢ for C = UC,.) 

8. If wis finitely additive, yu is finite, and u4, — 0 implies g4, — 0, then 
y is o-additive. 

We say that ¢ is go-continuous if yA = 0 implies gf = 0. 

9, If u4n — O implies gp4n — 0(G4n — 0), then ¢ is w-continuous. If o 
is finite, then the converse is true. 

(Assume the contrary of the converse; there exist € > 0 and 4, such that 


An < -s and 04, 2e. ThenywB = Oand $B 2 ¢ for B = lim sup Jy.) 
What if 9 is o-finite? What about @ consisting of all subsets of a denumer- 
able space of points w,, andu{w,} = = , {wn} =n. What about u replaced by 


Po? 

JO. If the uw; are finite measures, then there exists a » such that all the y; 
are u-continuous. (Take up = >> »;/2%u;2.) What about y,’s replaced by ¢;’s? 

Let ® C@ be a o-field such that the measurable subsets of elements of ® 
belong to ®. Let @(¢) be the class of sets such that their subsets which belong 
to @ are g-null. Call the sets of ® “singular,” and the sets of @(¢) “regular.” 
Call ¢ regular (singular) if every singular (regular) set is g-null. 

Let v, = o,+ — @,7, Gs = Ost — Ge, defined by 


y,+(4) = sup o+(B) forall regular BC A, 
y.+(4) = sup g*+(B) forall singular BC 24. 


11. Decomposition theorem. , is regular, ¢, is singular, and g = ¢, + @s. 
If o is finite, then the decomposition of gy into a regular and a singular part is 
unique. What if g is o-finite? What if @ consists of all subsets of a noncount- 
able space, and (4) equals the number of points of 4? (Proceed as follows: 

(i) B(¢) = B(G) = B(et) N B(¢-) is a o-field. 

(ii) ¢-(¢.) is a regular (singular) signed measure. 
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(iii) Every 4 contains disjoint 4, regular and 4, singular such that ¢,=(4) 
= 0+(4;), gs*(4) = y*(A,). 

(iv) If 4 = A’, + A’, with 4’, regular and 4’, singular, then we can take 
A, = A’, and A; = A's. 

(v) If ¢ is finite, every 4 can be so decomposed.) 

72. We can take for singular sets: 


(i) the p-null sets—regular (singular) becomes yp-continuous (u-discon- 
tinuous); 

(ii) the countable measurable sets—regular (singular) becomes continuous 
(purely discontinuous); 

(iii) the countable sums of atoms—regular (singular) becomes nonatomic 
(atomic). 


In each case investigate the regular and singular parts. 

13. Intermediate-value theorem (compare with continuous function on a con- 
nected set). If 4 is nonatomic and 4, fT 4 with gdp finite, then ¢ takes 
every value between —g~4 and + 7Z for measurable subsets in 4. (See 7.) 
What if @ consists of all sets in a noncountable space, y(4) = 0 or © according 
as 4 is countable or not? 

In what follows, the g, are o-additive but, unless otherwise stated, lim og» 
is not assumed to be o-additive. 

14. If on — gy o-additive, then g + S lim inf gn*. If, moreover, ga T or 
gn |, then g+ = lim gn+. 

15. If on T (1) and g1 > —0(< +), then y, — ¢ o-additive. 

16. If on — guniformly on @ and gy > —wor g < +, then ¢ is c-additive. 

17. To a measure space (2, @, mu) associate a complete metric space (X, d) 
as follows: 9X is the space of all sets 4, B of finite measure, d is a metric defined 
by d(4, B) = u(4B°+ A°B). Prove that the metric space is complete. 

(If 4, is a mutually convergent sequence in X, then the sequence J,4, mutually 
converges in measure and hence converges in measure—see 6.3.) 

If vy on @ is a finite w-continuous measure, then »y is defined and continuous 
on (X, d). 

We say that the ¢, are uniformly u-continuous if uAm — Oimplies gn4m — 0 
uniformly in 2, asm — ©, 

18. Let uw beo-finite. If the finite yg, are u-continuous and lim On exists and is 
finite, then the g, are uniformly u-continuous and lim On = ~ 1S u-Ccontinuous 


and c-additive. (For every e > 0, set 4 = n q | 4 EX; | ond — on | 


=k n=k 
Ss 5 . By (/7), every 4, is closed. By Baire’s category theorem, there exists 
ko, do and Ao E X such that (Ze xX; aA, Ay) < a] CT Arg: Let 0 < do < d 
such that | ¢,4|<¢ whenever ud < 59 and m Sho. If ud < do, then 
a(Ao — A, Ao) < do, d(AyU A, Ao) < do, and | end | < | On | +- 
| Pnl(Ao U A) — Pro Ao U A) | + | Gn(4o — A) — Cr(4o — 
19. If finite gn — ¢ finite, then g is o-additive. (If | Pn | S cn, set 


nd = S| ond | and apply 18) 


Chapter IT 


MEASURABLE FUNCTIONS AND INTEGRATION 


§ 5. MEASURABLE FUNCTIONS 


5.1 Numbers. Spaces built with numbers are prototypes of all 
spaces, and functions whose values are numbers are prototypes of all 
functions. 

By a number x we mean either a usual real number—jinite number— 
or one of the symbols + and —«—injfinite numbers. These symbols 
are defined by the following properties: 


0 Sx < +o, 


+o =(+t0)+%=x+ (+0), =0 if -—o <x < +o, 


+ co 


+o if O<x* < +0 
x(t) = (to)x = 10 if x =O 
Fo if —-oxx <0. 


The expression -+-%© — © is meaningless, so that, when speaking of a 
‘“‘sum” of two numbers, we assume that, if one of them is Fo, the other 
one is not ++; then the sum exists. 

The reason for the introduction of infinite numbers lies in the fact 
that, then, sup «; and inf x, = — sup (—-;), where ¢ varies over an arbi- 
trary set J, always exist (but may be infinite). Moreover, if inclusion, 
union, and intersection of numbers are defined by * S y, sup x; and 
inf x, respectively, then these operations have properties of the corre- 
sponding set operations; in particular, limits of monotone sequences of 
numbers always exist, but may be infinite. 
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If, as 2 —> o, the limit « of a sequence x, of numbers exists, we write 
x = lim x, or ¥, — « and say that x, converges to x; if x is infinite, say, 
+, one also says that x, diverges to +0. The Cauchy mutual conver- 
gence criterion is valid only for finite limits: x, converges to some finite x 
if, and only if, Xm — *n — 0 (as m,n — ©) or, equivalently, if Xn+y — 
Xn —> O uniformly in v. On the other hand, the Bolzano-Weierstrass 
lemma remains valid without the usual restriction of boundedness: 
every sequence of numbers is compact, that 1s, contains a convergent sub- 
sequence, but if the sequence is not bounded then the limits may be 
infinite. 

The set of all finite numbers is a rea/ line R = (—, +00) and the 
set of all numbers is an extended real line R = [—«, +o]. The basic 
class of sets in R is the class of intervals; there are four types of finite 
intervals of respective form: 


[2, 5): set of all points ¥ such that aS x < 3}; 
(a, 5]: set of all points x such that a<x 3; 


(a, 5): set of all points « such that a <x < 3}; 


< 
[a, 5]: set of all points x such that aSxw Sb. 


The minimal o-field over the class of all intervals in R is the Borel field 
in R and its elements are Borel sets in R. The Borel field in R coincides 
with the minimal o-field over the subclass of all intervals of one of the 
foregoing four types, since countable operations performed upon ele- 
ments of one of these subclasses yield any element of the other sub- 


classes; for example, (a, 4) = U E +5, ») , [a, 4] = 1) a, b+ ) ) 


etc. Similarly, the Borel field in R is the minimal o-field over the sub- 
class of all infinite intervals of the form (—~», x), —» S x S$ +o, since 
any finite interval [a, 4) is obtainable as a difference Ap_,(—™, a) = 
(—o, 5) — (—, a). The Borel field in R can be defined similarly by 
means of any of the foregoing types where —0© S$ a S$ b S +, or by 
means of the intervals [—», «), —-© Sx S +; but, frequently, the 
most convenient way is to take the minimal o-field over the class formed 
by the Borel field in R and the two sets {—00}, {+0}. 

Extension. ‘The preceding notions extend at once to finite-dimensional 
real spaces. The set of all ordered N-uples ~ = (*1, -+-, *n) of finite 
numbers is the N-dimensional real space R™ or, equivalently, the prod- 
uct space [J R, of N real lines R, = (—o < x, < +0). If every R, is 


vol 
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replaced by R, = [—%o S x, S +o], then we have the extended N-di- 
mensional real space RN. If a,b C RN, then a S 4 means that a, < J, 
for y = 1, 2, ---, N, and, similarly, for a < 5, a = 3b. 

An interval, say [a, 4), will also be written more explicitly as [a1, ao, 
"*'y any by, ba, ney bn), and 


[2, 6) = Ap_a(—, a4) = Ap,-a,Av,—a. ° °° 
Ayy—ay(— 5 —_@, eee —; 1, ay eee ay) 


where Aj,-4, is the difference operator of step 4, — a, acting on 4,. 
For example, if N = 2, then 


[41, 493 41, b2) = Av,—a,Av,—a,(—, — ©; 1, ae) 
= Ap,—a,{(—%, —%; a, be) — (—2, —00; ay, ae)} 
= (—0, —0; by, bp) — (—2%, —0; ay, be) — (—2, —@; 
by, 4g) + (—%, —; ay, a). 


With this interpretation, the foregoing definitions of types of intervals 
and, thereafter, of Borel fields, remain the same. 

5.2 Numerical functions. A numerical function X on a space Q is a 
function on @ to R, defined by assigning to every point w € Q a single 
number ¥ = X(w), the value of X at w. If infinite values are excluded, 
X 1s a finite function or, equivalently, a function on Q to R. Q is called 
the domain of X and R (or R) is called the range space of X. The func- 
tions XT = XIix>q and X~ = —XTix <o will be called the positive 
part and the negative part of X, respectively, and we have 


X= Xt—X-, |X| = xXt4 X-. 


Unless otherwise stated, all functions will be numerical functions and, 
in general, will be denoted by X, Y, ---, with or without affixes. 

If definitions or relations between values of given functions hold for 
every w belonging to a set 4 C Q, we say that these definitions or rela- 
tions hold ov 4 and drop “on 4” if 4 = Q. For example, 


| X | < © means that X is finite; 

X = Oon 4 means that X(w) = 0 for every w € 4; 

X = inf X, means that X(w) = inf X,(w) for every w € Q; 

Xn — X on A means that X,(w) — X(w) for every w € J, etc. 


Conversely, the set of a//] w € 2 on which definitions or relations 
hold is denoted by [w; ---] or, if there is no confusion possible, by [---] 
where --- stand for the definitions or relations. For example, 
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[X] is the set on which X is defined; 

[X = Y] is the set of all w © Q for which X(w) 2 Y(w); 

[X € S], where S C R, is the set of all w € Q for which the values 
X(w) belong to the set S. 


The set LX = x] is called the inverse image of the set {x} which con- 
sists of x only or, simply, of «. Since X is single-valued, the inverse 
images of distinct numbers ~ are disjoint, and the partition of Q into 
inverse images of all « € R is called the partition of the domain in- 
duced by X; we sometimes write X = >) xJ,xy=. where Jjx=, 1s the in- 

xeR 
dicator of [X = x]. In particular, if X is countably valued, that is, takes 
only a countable number of values x,;, then, and only then, 


A = d, Lx =2)- 
j 


More generally, the set LX € S] is called inverse image of S and is 
also denoted by X—1(S). The symbol X~', which can be considered 
as representing a mapping of sets in R onto sets in Q, is called the in- 
verse function of X. Since inverse images of disjoint sets of R are dis- 
joint, it follows easily that 


X and set eperations commute: 
X75 — 8’) = X*(8) — X75), X7VU 8) = U XS), 
X(T) 8) = 17) X82). 


Similarly, X—1(@) or the inverse image of ©, where @ is a class of sets 
in R, is the class of all inverse images of elements of ©. Since set opera- 
tions commute with inverse functions, it follows that 


a. The inverse image of a o-field is a o-field, the inverse image of the 
minimal o-field over a class is the minimal o-field over the inverse image 
of the class, the class of all sets whose inverse images belong to a o-field is a 


o-field. 


The foregoing definitions and properties extend at once to functions 
X = (X41, ---, Xy) on Q to an N-dimensional real space R™ (or RY) 
or, equivalently, to N-uples of numerical functions Xj, ---, Xy. Classi- 
cal analysis is concerned with functions from a real line to a real line 
or, more generally, from a finite-dimensional real space R® to a finite- 
dimensional real space R%. Still more generally, let X be a function 
on Q to R® and let ¢g be a function on R® to RN’. The function of func- 
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tion gX defined by (¢X)(w) = g(X(w)) is a function onQ to R®’. Clearly, 
its inverse function (¢X)7! is a mapping of sets S’ in R®’ onto sets in 
such that 

(gX)7*(S) = X7*(g7(8')) 


or, in a condensed form, 


(gX) = Xe. 


5.3 Measurable functions. Classical analysis is concerned primarily 
with continuous functions on R to R’ or, more generally, on R*® to 
RN". However, passages to the limit, which play such a basic role in 
analysis, do not, in general, preserve continuity (and also they cause 
the appearance of Fo). The essential achievement of modern analy- 
sis, due to Borel, Baire, and Lebesgue, is the introduction of a wider 
class of functions which is closed under the “usual” operations of analy- 
sis: arithmetic operations and formation of infima, suprema, and limits 
of sequences. Those are the functions we intend to define now. 

In the domain Q of our functions we select a o-field @ of sets, to be 
called @-sets or, if there is no confusion possible, measurable sets; the 
doublet (Q, @) is called a measurable space. In the range space R of 
our functions we select the o-field ® of Borel sets—the Borel field in R; 
the doublet (R, &) is an (extended) Borel line. Thus, our functions are 
defined on a measurable space (Q, @) to the Borel line (R, 8). More 
generally, if the range space is R¥, then we select the Borel field ®, 
and the doublet (R%, 8%) is an extended Borel space; then the functions 
are defined on a measurable space (Q, @) to the Borel space (R¥, ®). 

A countably valued function X = >’) «;I4, where the sets 4; are 
measurable is called an elementary measvrable function or, simply, an 
elementary function; if the number of distinct values of X is finite, then 
X is also called a simple function. 


(C) Limits of convergent sequences of simple functions are called meas- 
urable functions. 


This is a constructive definition and, because of that, will play an es- 
sential role in the constructive definition of integrals. However, gen- 
eral properties of measurable functions are easier to discover and to 
prove when using the descriptive definition which follows. 


(D) Functions such that inverse images of all Borel sets are measurable 
sets are called measurable functions. 


Yet this definition is not the most economical one, since 


108 MEASURABLE FUNCTIONS AND INTEGRATION [Sec. 5] 


(D’) In (D), i suffices to require measurability of inverse images of 
elements of any fixed class © such that the minimal o-field over © is the 
Borel field. 


For example, we can take €@ to be the class of all intervals, or the class 
of all intervals [—o, x], etc. 

The proof is immediate. Since a mapping X7! preserves all sets 
operations and the measurable sets form a o-field, it follows that the 
class of all sets whose inverse images are measurable is a o-field. There- 
fore, if, according to (D’), it contains @, then it contains the minimal 
g-field over © which, by assumption, is the Borel field. 

Similarly, the constructive definition (C) is not the most economical 
one as we shall find in proving the basic theorem below. 


A. MEASURABILITY THEOREM. The constructive and descriptive defi- 
nitions are equivalent, and the class of measurable functions is closed under 
the usual operations of analysis. 


Proof. 1° Let X, be functions measurable (D), that is, measur- 
able according to (D) or, equivalently, (D’). Then all sets 


(inf X, <«}] = U(X, <«], [-X, < «] = [Xa > -+] 
are measurable and, hence, the functions 
sup X, = — inf (—X,), liminf X, = supn (inf Xz), 
nap eatin: 


are measurable (D). Thus, the class of functions measurable (D) is 
closed under formation of infima, suprema, and limits. But every simple 
function X = }/xj;I,4,; 1s measurable (D), since all sets [X S$ x] = 


>, 4; are measurable. Therefore, limits of convergent sequences of 
2jSr 
simple functions are measurable (D); in particular, functions measur- 


able (C) are measurable (D). 
2° Conversely, let X be measurable (D) so that the functions 
we eee] 


Xn a — nlx <=n) -- » 


—n2"+1 a 


IPED ex ck] + nlixe ny 


Qn 


m=1,2,->- 
are simple. Since 


| X,(w) ~ X(w) K= for | X(w)| <7 


and 
Xr(w) = +n for X(w) = tox, 
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it follows that X, — X and this, together with what precedes, com- 
pletes the proof of the equivalence of the two definitions of measura- 
bility. 

We observe that if X is nonnegative, then the foregoing functions 
X, become 


na k—1 
and we have O S X, 7 X. Also, if 
+o B 
Xa = EF Mgtereh] + Ole--a to Mea en 


1 
then | X'n — X| <5, on [| X| < «] and 4’, = X on [| X| = ~], so 


that X’, — X uniformly. 

3° It remains to prove closure under the arithmetic operations. 
Using definition (C) and the fact that arithmetic operations commute 
with passages to the limit by convergent sequences, it suffices to show 
that the class of simple functions is closed under the arithmetic opera- 
tions. But much more is true, for if g on R% is an arbitrary function 
and X, = D0 Xgl ayy & = 1, +++, N, are simple (elementary) functions, 

j 


then the function of functions 


BX, ++, Xv) = Di &(*1j5 a) tj yt Ari, ee Laniy 


is simple (elementary). This completes the proof. 
According to this proof we have new equivalent constructive definitions 
of measurable functions that we state now. 


(C’) A nonnegative function is measurable if it is the limit of a nonde- 
creasing sequence of nonnegative simple functions. A function X 1s meas- 
urable if its positive and negative parts X* and X~ are measurable. 

(C”) A function is measurable if it is the limit of a uniformly conver- 
gent sequence of elementary functions. In particular, every bounded measur- 
able function is limit of a uniformly convergent sequence of simple functions. 


Definition (C’) will play a central role in the theory of integration. 
Closure under the arithmetic operations is a very particular case of 


a. A Baire function of measurable functions is measurable. 


Proof. Let us recall a (constructive) definition of Baire functions 
(we consider only finite-dimensional Borel spaces). Batre functions are 
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elements of the smallest class closed under passages to the limit con- 
taining all continuous functions. Therefore, since the class of measur- 
able functions is closed under passages to the limit, it suffices to prove 
that 


A continuous function of measurable functions is measurable. 


Thus, let gon R be continuous; that is, for every point (#1, +++, *w) 


DN 
ER, 
g(x, ——) x'y) - g(*1, a) “n) as xy > Miy °°", xv —> xy. 


Let X;, k = 1, 2, ---, N, be measurable and let X,,; be sequences of 
simple functions such that X,, — X;, for every k. We found (in 3°) 
that the functions g(Xn1, ---, Xnn)(that we assumed tacitly to have 
meaning) are measurable and hence, by continuity and closure: under 
passages to the limit, the function 


g( Xi, my Xy) = lim g(Xn1 mt ty XnwN) 
is measurable. This completes the proof. 


All the foregoing definitions and properties extend at once, and word 
for word, to functions on a measurable space to any finite-dimensional 
Borel space, provided we replace R by R™ and leave out the operations 
of multiplication and division that we do not define (at least here) for 
such functions. For example, 


functions such that inverse images of Borel sets in their range space are 
measurable sets in their domain are called measurable functions. 


This extension is useful but, in fact, brings nothing new, for 


b. 4 function X = (Xy, +--+, Xn) 1s measurable if, and only if, its 
components X1, +--+, Xn are measurable. 


In other words such a function is merely an N-uple of numerical meas- 
urable functions. 


Proof. If X = (X%1, --+, Xw) is measurable, then, for every k S N, 
the sets 


[X, S we] = X,7*[-2&, oy] 
= X"[—«, mt y OO; +o, a) +o, Xk» +o, mrs +00] 


are measurable, so that X;, is measurable. 
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Conversely, if all X; are measurable, then the sets 


N 
[X Sx] = [M1 Sm, +--+, Xw Sew] = [Xe S ~] 
k=l 
are measurable, so that X = (Xj, ---, Xy) is measurable. 

We give another (descriptive) definition of Baire functions. With 
this definition, it is customary to call these functions Borel functions. 
A measurable function on a finite-dimensional Borel space to a finite- 
dimensional Borel space is called a Borel function. In other words, g on 
R® to R*’ is a Borel function if, and only if, the inverse images of Borel 
sets S’ in R™’ are Borel sets Sin R®. The proof of a in this more gen- 
eral case is then immediate and we have 


a’. A Borel function of a measurable function 1s measurable. 


For, if X is a measurable function (not necessarily numerical) and g is 
a Borel function on the range space of X, then, for every Borel set S’ 
in the range space of g, the set (¢X)7!(S”) = X7~'(g71(8’)) is measura- 
ble and, hence, gX is a measurable function. 


§ 6. MEASURE AND CONVERGENCES 


6.1 Definitions and general properties. The notions of ‘measur- 
able” sets and ‘“‘measurable”’ functions are two out of a triplet of notions, 
due essentially to Lebesgue, the third being the notion of “‘measure” 
which gave its name to the two others, and which we shall introduce 
now. 

A function ¢ on a o-field @ is said to be o-additive if, for every counta- 
ble disjoint class {4;} C @, 


od! 4j) = do o(4)). 


To avoid trivialities, it is assumed that at least one value of ¢, say, 


y(4o), 4o € @, is finite. Since 
g(Ao + 9) = o(4o0) = o(40) + o(9), 


this assumption is equivalent to g(@) = 0. To avoid meaningless ex- 
pressions of the form -+-o —o, it is assumed that at least one of the 
possible values —o or ++ is excluded. 

y is said to be finite if its values are finite, and it is said to be o-finite 
if the space in which @ is defined can be partitioned into a countable 
number of sets in @ for which the values of ¢ are finite. 
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A measure » on a o-field @ is a nonnegative and o-additive function. 
In other words, u is defined by the three following properties: 


(i) wO> 4;) = + u(4;) for every countable disjoint class {4;} C @; 
(ii) u(4) 2 O for every 4 E€ @; 
(it) p(@) = 0. 


The value u(4) of uw at 7 is called the measure of A and, if there is no 
confusion possible, we drop the bracket following the symbol u. 

A measure space (Q, @, uw) is formed by the space Q, the o-field @ of 
measurable sets in this space, and the measure yu defined on this o-field. 
Unless otherwise stated all sets under consideration will be measurable 
sets in our measure space. A set of measure 0 is said to be a u-null set 
or, if there is no confusion possible, a zu// set, and definitions or relations 
valid outside a p-null set are said to be valid almost everywhere (a.e.). 
The following properties of the measure pw are immediate: 


a. wis nondecreasing, and 1s bounded if the space Q ts of finite measure. 
This follows from 
pB=phA+pn(B-—- A 2nd for BDZ. 
b. uw ts sub o-additive: wh) A; S > ud. 
This follows from 
BU 4; = w(41 + A1°A4e +-°:) 
= py + pAy’de +--+ Sudy + udot:::. 
A. SEQUENCES THEOREM. If 4,7 A, then pAn} ud and, in general, 
lim inf p4, = pw(lim inf 4,). 
Lf wis finite, then, moreover, 
Ant A implies pAn\ uA, lim sup udn S w(lim sup 4,), 
An — A implies phn — wa. 
Proof. If A, tA, then, by o-additivity, 
pA = uA, + p42 — 41) +°-°: 
= lim {udy + w(4e — 41) +:+++ wlan — An-1)} 


= lim pp. 
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If 4, is an arbitrary sequence, then, since B, = () 4; 7 lim inf 4, and 
kan 


uAn = wBn, it follows that 
lim inf u4, 2 lim wB, = u(lim inf Z,), 


and the first assertion is proved. 
Let now p be finite and use the proved assertion. If 4, 14, then 
Ay — A, |} Ay, — A and, hence, 


uA, — pAn = wW(A1 — An) T u(A1 — 4) = wd — BA, 


so that u4n) ud. If An is an arbitrary sequence, then nQ — lim sup uJ, 
= lim inf p4,° 2 p(lim inf 74,°) = pQ — yu(lim sup 4,) and, hence, 
lim sup uf, S u(lim sup 7,). Finally, if 47, — 4, then the two in- 
equalities proved above yield n4, — ud, and the proof is complete. 


The introduction of measures yields new types of convergence founded 
upon the notion of measure and unknown in classical analysis. Before 
we introduce them, we recall the classical types of convergence; unless 
otherwise stated, we consider sequences .X, of measurable functions on 
a fixed measure space (Q, @, u) and limits taken as 2 > o., 


If X, converges to X on 4 according to a definition ‘‘c’” of conver- 


gence, we say that X, converges “c”’ on 4 and write X, — Xon A. 
The Cauchy convergence criterion leads to the corresponding notion of 
mutual convergence: if Xn4, — Xn, converges “c” to 0 on 4 uniformly 
in vy (or Xm — Xp converges “‘c”’ to0 on 4 as m, nm — ©), we say that 


. c 
Xp, mutually converges “‘c’’? on A and write Xn4y — Xn — O (or Xm — 


Xn 0). In defining mutual convergence, we naturally must assume 
that the differences exist, that 1s, meaningless expressions -+co —o do 
not occur. We drop “on 4” if 4 = Q and drop “‘c” if the convergence 
is ordinary pointwise convergence. 

We recall that X, — Xon 4 means that, for every w € 4 and every 


e > 0, there is an integer 7... such that, for 2 = Me, 


if X(w) is finite, then | X(w) — Xn(w) | < ¢, 


1 
if X(w) = —o, then X,(w) << —-> 


€ 


1 
if X(w) = +o, then X,(w) > + - 
€ 
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If e = % 1s independent of w € JA, then the convergence is uni- 


form and, according to the preceding conventions, we write Xn —> X 
on 4. According to the closure property of measurable functions, if a 
sequence of measurable functions X, — X, then X is measurable. 
According to the Cauchy criterion, if X, are finite, then 


Xn — X finite if, and only if, Xm — Xn — 0 or, equivalently 
Xn+y — Xn — 0 


X, — X finite if, and only if, X,— X, — O or, equivalently, 
Xnay ~ Xn — 0. 

6.2 Convergence almost everywhere. A sequence X, is said to 

converge a.e. to X, and we write X, —> X, if X, > X outside a 

null set; it mutually converges a.e., and we write Xm — Xn — 0 or 


Xns+v — Xn 3 O, if it mutually converges outside a null set. It 
follows, by the Cauchy criterion and the fact that a countable union 
of null sets is a null set, that 


a. A sequence of a.e. finite functions converges a.e. to an a.e. finite func- 
tion if, and only if, the sequence mutually converges a.e. 


a.e. . . 
Let X, — X. Since Xz, are taken to be measurable, X is ave. 
measurable, that is, X is the a.e. limit of a sequence of simple functions. 


Also, if X’ is such that X, 4 X’, then X = X’ ae., for X can differ 
from X’ only on the null set on which .X, converges neither to X nor 
to X’. Thus, the limit of the sequence X, is a.e. determined and 
a.e. measurable. Moreover, if every X, is modified arbitrarily on a 
null set NV,, then the whole sequence is modified at most on the null 
set U N,, and, therefore, the so modified sequence still converges a.e. 
to X. 

These considerations lead to the introduction of the notion of ‘‘equiv- 
alent” functions: X and X’ are equivalent if X = X’ a.e. Since the no- 
tion has the usual properties of an equivalence—it is reflexive, transi- 
tive, and symmetric—it follows that the class of all functions on our 
measure space splits into equivalence classes, and the discussion which 
precedes can be summarized as follows. 


b. Convergence a.e. is a type of convergence of equivalence classes to an 
equivalence class. 


In other words, as long as we are concerned with convergence a.e. of 
sequences of functions, these functions as well as the limit functions are 
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to be considered as defined up to an equivalence. In particular, we 
can replace an a.e. finite and a.e. measurable function by a finite and 
measurable function, and conversely, without destroying convergence 
a.e. 

Let us investigate in more detail the set on which a given sequence 
converges. To simplify, we restrict ourselves to the most important 
case of finite measurable functions, the study of the general case being 
similar. By definition of ordinary convergence, the set of convergence 
[Xn — xX] of finite X, to a finite measurable X is the set of all points 
w €Q at which, for every « > 0, | X(w) — Xn(w) | < for 7 = mew 
sufficiently large. Since, moreover, the requirement “for every ¢ > 0” 


b 


is equivalent to “for every term of a sequence g& {0 as k > ~,” say, 


1 
the sequence — , we have 


k 
[X, > XJ=NUNI Xp - X| <4 


é>0 


AUN || Xow — X1 <2]; 


so that the set [X, — X]is measurable. Similarly for the set of mutual 
convergence, since the set 


[Xny ~— Xn —_ 0] 


AQUNY Xn - Xn| < 


E>0n y 
1 
— AU || Xue — Xa <7 
kn yp 
is measurable. Thus 


c. The sets of convergence (to a finite measurable function) and of mu- 
tual convergence of a sequence of finite measurable functions are measurable. 


In other words, to every sequence we can assign a “measure of con- 
vergence”’ and, the sets of divergence [X, 4 X]and [Xn4, — Xn H 0] 
being complements of those of convergence and, hence, measurable, to 
every sequence we can assign a “measure of divergence.” In particu- 
lar, the definitions of a.e. convergence of a sequence X, mean that 


wX, B XJ=0 or ulXny — Xn H 0) = 0. 


Upon applying repeatedly the sequences theorem to the above-defined 
sets, we obtain the following 
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A. CONVERGENCE A.E. CRITERION. Let X, Xn be finite measurable 
functions. 


X, — X if, and only tf, for every « > 0, 
NU Xp -X|/24=0 


and, if wis finite, this criterion becomes 


eUl| Xn -X| 2d 0. 


Xn4v — Xn —+ 0 if, and only if, for every e > 0, 
eNU | Xn — Xn| = e] = 


and, if wis finite, this criterion becomes 


wHU | Xn — Xn | 2d - 0. 


6.3 Convergence in measure. A sequence X, of finite measurable 
functions is said to converge in measure to a measurable function X 


and we write X, + XxX if, for every « > 0, 
uf] X, -X|2qd—- 0. 
The limit function X is then necessarily a.e. finite, since 
pl] X| = @] = ull X, — X| = >] Sal] X- X12 4 — 0. 
Similarly, Xn4, — Xn > 0 if, for every e > 0, 
ul] Xnay — Xn | Ze] — O (uniformly in 7). 


All considerations about equivalence classes in the case of convergence 
. ° . . . td 

a.e. remain valid for convergence in measure. In particular, if X, — X 
u ° 

and X, — X’, then X and X’ are equivalent, for 


lX-X’l2dse[[¥-~%les/+e[|%—xXl2s| 0 


and, hence, 


WX X1=wU[|X- xX z5]=0 
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We compare now convergence in measure and convergence a.e. 


A. COMPARISON OF CONVERGENCES THEOREM. Let X, be a sequence of 
finite measurable functions. 

If Xn converges or mutually converges in measure, then there is a sub- 
sequence Xp, which converges in measure and a.e. to the same limit func- 
tion. If pw is finite, then convergence a.e. to an a.e. finite function implies 
convergence in measure to the same limit function. 


Proof. The second assertion is an immediate consequence of the 


. . . . a.e. . 
a.e. convergence criterion, since uw finite and X, —> X imply that, 
for every e > 0, 


ul] Xn - X| 2d SeU [|X p~-X| 2d 0. 


As for the first assertion, let Xn4, — Xn > 0. Then, for every inte- 
ger & there is an integer 7(k) such that, for 7 2 n(k) and all p, 


1 1 
Let 7, = n(1), m2 = max (7; + 1, n(2)), m3 = max (mq + 1,n (3)), etc., 
so that 71 < m2 <3 <-++ + 0, Let X", = X;,, and 
1 
A, = [ X'na1 — X%,| = xi Bn =U Ak, 
so that - 


1 1 
Ay < 5 uB, S 2 nde < a 


Thus, for a given e > 0, 7 large enough so that < «, and all », we 


qn—l 
have on B,° 
| X'n4y — Xn | Do Xn — X%| < wai < 
Therefore, 
MMU TL Magy — XS SHU Xn — Xn] 2 
Sub, < = — 0 


e e a.e. 
and, hence, by the convergence a.e. criterion, X’n4, — X'n — 0. 
. . a.e. . 
Thus, by 6.2a, there is a finite X’ such that X’, —> X’. Since on 
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B,° we have | X'n+ay — X’n| < € for all v it follows, upon letting » > ©, 
that on B,° we also have | X’ — X’, | < € outside perhaps a null subset. 
Therefore, upon taking complements, 


ul] X’ — X’n | Se] S uBa < 


qn-l - 0; 


so that X’, — X’. A similar argument shows that X, > X implies 
X', —> X. This completes the proof. 


Coro.uary. Convergence and mutual convergence in measure imply one 
another. 


Proof. If Xn —> X, then, for every e > 0 and all », 


| Xn+y — Xn | = e] 


IIA 


€ 
1 || Xe — X12 S| 


€ 
+u [|XX = 5| — 0, 
so that Xn4y — Xn > 0. Conversely, if Xn4, — Xn > 0, then, upon 
taking the subsequence X;,, of the foregoing theorem, we obtain, for 
every ¢« > 0, by letting 7,, 7 — ~, 


ul X— Xq| 2 su[[X— | 24] +u[| %, —%el 24] 0, 


so that X, > X, and the corollary is proved. 


§ 7. INTEGRATION 


The concepts of o-field, measure, and measurable function are born 
from the efforts, made in the nineteenth and the beginning of the twen- 
tieth centuries, to extend the concept of integration to wider and wider 
classes of functions. The decisive extension was accomplished by Le- 
besgue, after Borel opened the way. Lebesgue worked with the special 
“Lebesgue” measure. Radon applied the same approach working with 
Lebesgue-Stieltjes measures. Finally, Fréchet, still using Lebesgue’s 
approach, got rid of the restrictions on the measure space on which 
the numerical functions to be integrated were defined. 

Lebesgue had two equivalent definitions of the integral, a descrip- 
tive one and a constructive one. We shall use a constructive defini- 
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tion of the integral of which there are many variants, but the basic 
ideas are always the same and, in general, the integral is first defined 
for simple functions. Although infinite values are not excluded, never- 
theless, the expression -+-c —o, being meaningless, must be avoided. 
Therefore, it behooves us to start with integrals of functions of constant 
sign, say, nonnegative ones. Furthermore, the central property of the 
integral, called “the monotone convergence theorem,” says that for a 
nondecreasing sequence of nonnegative functions integration and pas- 
sage to the limit can be interchanged. Therefore, we give here the ap- 
proach aimed directly at this theorem, an approach which requires a 
minimum of notions and of effort. The reader will recognize in the 
central definition 2° below, a particular form of the monotone conver- 
gence theorem. 

7.1 Integrals. We consider a fixed measure space (Q, @, »); 4, B, 

--, and X, Y, ---, with or without affixes, will denote measurable 
sets and (numerical) measurable functions, respectively. 


Derinitions 1° The integral on Q of a nonnegative simple function 


X = 9) x; 4; is defined by 


j=1 


[ xa = )) xjud;. 


j=1 


2° The integral on Q of a nonnegative measurable function X is de- 


fined by 
[ xa _ lim | Xd 
Q Q 


where X, is a nondecreasing sequence of nonnegative simple functions 
which converges to X. 
3° The integral on Q of a measurable function X is defined by 


[ Xa = | xt a - | x ay, 
Q 2 2 


where Xt = XI; x>0) and X- = —XI x <0) are the positive and nega- 
tive parts of X respectively, provided the defining difference exists, 
that is, provided at least one of the terms of this difference is finite. 


If [ X4u is finite, that is, if both of the terms of the difference are 
Q 


finite, X is said to be integrable on Q. 
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Finally, if X is a.e. determined and measurable, that is, there exists 
a measurable function X’ such that X = X’ outside a yp-null set, 


we set f X= f X', provided the right-hand side exists. 


Upon replacing, in the preceding definitions, 2 by a measurable set 
A (hence replacing, in 1°, every 4; = Q4; by 44;), they become defi- 


nitions of the integral of X on A, to be denoted by f X dp. Since, for 
A 


X = 9) 214; 2 0, we have 


j=l 


[ Xt du = Land, s{ Xdu, 
2 


it follows immediately that 


if [ Xap exists $0 does { X du, and | X4u =| XL dp. 
Q A A 2 


To simplify the writing, we drop du and Q in the foregoing symbols, 
unless confusion is possible; thus, the symbols { X du and f X dp 
Q A 


will be replaced by f X and i) X, respectively. 
A 


Justification and additivity. We have to justify the three definitions 
1°, 2°, 3°, that is, we have to show that the concepts as defined exist 
and are uniquely determined. In the course of the justification we 
shall have use for the elementary properties below; the first one is 
called the additivity property of the operation of integration. 


A. ELEMENTARY PROPERTIES. Lt { X,] ¥,f x +[¥ exist. 


I Linearity: 


faw+y-[x+fy, JoeX “+S % fex= {x 


II Order-preservation: 


X20 = [x20, xzY— {xafy, 


X= Yae = [x-=[Y, 
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III 9 JIntegrability: 
X integrable = | X | integrable > X ae. finite; 
| X | < Y integrable = X integrable; 
X and Y integrable => X + Y integrable. 


Assume that the additivity property is proved. Then the second of 
properties I follows by replacing in the first one X by XJ4 and Y by 
XIg. The third one follows directly by successive use of the definitions. 

The successive use of the definitions also proves directly the first 
and third of properties II, and the second one follows by the additivity 
property upon setting X = Y + Z, where Z 2 0. 

Similarly for properties III, except for |X| integrable > X ae. 
finite. But, if uz > 0 where 4 = | xX | = o], then, on account of I], 


fi Xx | || X|I4 2 cud whatever be c >0. It follows, by letting 


c¢ — o, that J | xX | = 00, and the property is proved ad contrario. 
Thus 


For each of the successive definitions, the elementary properties hold as 
soon as the additivity property is proved. 


We use this fact repeatedly in proceeding to the successive justifica- 
tions of the definitions and to the proof of the additivity property. 


m 
1° Nonnegative simple functions. Since X = 2 wil Aj is nonnega- 
tive, the defining sum in 7 


X= >, «jd; 2 0 


j=1 


exists; it may be infinite. Its value is independent of the way in which 
: n 


X is written. For, if X is written in some other form )/ ygJ,,, then 


m n k=1 
x; = y, if 4;B, #O and, from >) 4; = D> By, = Q, it follows that 
j=l k=l 
Dams =D xmdsBe = [TO 25L,,5, = | Doelavn 
j=l jik fuk jrk 


= > yeAjBe = Do yneBr. 
jk 


k=1 
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Thus, f X is unambiguously defined. 


Let now X = 2 xl, , and Y = 2) yelp, be two nonnegative simple 
k=1 


functions, so that x -+ ye 2 (xj + ye) ly By Proceeding as above, 


we have 


ft Y) = D0 (jy + ye) pAjBy = 2 xpA;By, + 2, Jun AsBr 
jk 


= Sx, + Dai -/x+fy, 


and the additivity property is proved. 


2° Nonnegative measurable functions, In definition 2°, the sequence 
of simple functions X, 2 0 is nondecreasing, so that, by AII for sim- 


ple functions, the sequence f Xn is nondecreasing and, hence, has a 


limit, finite or not. Moreover, for every nonnegative measurable func- 
tion X there exists such a sequence X, 1X. Therefore, to justify the 
definition, it suffices to show that the defining limit is independent of 
the particular choice of the sequence X,. In other words 


a. If two nondecreasing sequences Xn and Yy, of nonnegative simple 
functions have the same limit, then 


lim f X, = lim f Y,. 


Proof. It suffices to prove that O S X, 7 X and lim X, = Y, where 
Y is a nonnegative simple function, imply lim f Xn f Y. For, then, 


it follows from the assumptions that, for every integer 7, 


lim { Xn 2 f Yo lim [ Yn & f % 


and the asserted equality is obtained by letting p > o. 
First, we prove the asserted inequality under the supplementary re- 
strictions 
pQ <0, m=minY>O0, M=maxY < o. 


Let e >0 be less than m. Since lim X, 2 Y, it follows that 4, = 
[X, > Y— «TQ. But, on account of the validity of A for simple 
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functions and the finiteness of u and Y, we have 


[x =f Xl, = {Ww - 9%, =[¥-[¥lae- at, 


z/¥ — MuA,S — evdAn 


and, hence, by letting 2 — o and then e — O, the asserted inequality 
follows. Now, we get rid of the supplementary restrictions. 
If uQ = «, then 


[% = | Xela, = |v - ela, = (m e e)uAn Pre 05 


and the asserted inequality is trivially true. 
If M = o, then, the inequality being valid with X, and Yly<.j + 
clyy = 4.) where c is an arbitrary finite number, we have 


ee : x, > : Vigeg dey Se 


and, letting ¢ — ©, the right-hand side becomes f Y. 


Finally, if # = 0, then, since the functions X, and Y are nonnegative 
and, by what precedes, the inequality is true for integrals on [Y > 0], 


we have 
lim f Xq 2 lim x2 y=-[Y. 
[Y >0] [Y >0] 


This completes the proof and the definition of the integral of a non- 
negative measurable function is justified. 

Since the additivity property was proved for nonnegative simple 
functions X,, Yn, andO S$ X,7X,08 Y,1Y imply 0 S X,+ Y,T 
X + Y, it follows, by letting 2 — o in 


[+ Ya) = [+f ¥s 
fa+y-fxt+fy. 


Thus, the additivity property remains valid for nonnegative measurable 
functions. 


that 
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3° Measurable functions. The decomposition X = Xt — X7 of a 
measurable function into its positive and negative parts is unique, so 


that f X= f Xt — f X~ is unambiguously defined, provided f Xt 
or f X7 is finite. 

Finally, if X is determined and measurable outside a yw-null set N, 
then let X’ be any measurable function such that XY = X’ on N*. The 
integral of X is defined by setting f X = f X', provided | X’ exists. 


By AII for nonnegative measurable functions, the integrals of such 
functions which coincide on N° are equal. It follows, by definition 
3°, that the same is true when the functions are not of constant sign. 


Therefore, f X is unambiguously defined. 
It remains to prove the additivity property. 
Since we assume that not only f X and J Y exist but also that 


{ X + f Y exists, that is, is not of the form -+-« —o, it follows that 


(excluding the trivial case of the three integrals infinite of the same sign) 
at least one of the functions, say Y, is integrable and, hence, by AIII, 
is a.e. finite. Therefore, X + Y is a.e. determined, and we do not re- 
strict the generality by taking determined X and Y, and changing Y 
to 0 on the p-null event on which it is infinite and X + Y may be not 
determined. 

We decompose 2 into the six sets on each of which X, Y, and X + Y 
are of constant sign (20 or <Q). Because of definition 3° and prop- 
erty Al for nonnegative functions, it suffices to prove the additivity 
property on each of these sets, say 7 = [X 20, Y <0, X+ Y 2 Ol. 
But, on account of definition 3° and the additivity property for non- 
negative functions (X + Y)/4 and —YJay, we have 


Jx-fa+n+fen-fa+n-fy 


and, f Y being finite, 
A 


fxtfy-fa+rn. 


Similarly for the other sets, and the additivity property follows. 


[SEc. 7] MEASURABLE FUNCTIONS AND INTEGRATION 125 


This completes the justification of the definitions and the proof of 
the elementary properties. 

7.2 Convergence theorems. The central convergence property is as 
follows: 


A. MonoTONE CONVERGENCE THEOREM. Jf0 S X,1 X, then | Xnf 


[x 


Proof. Choose nonnegative simple functions Xkm TX, as m — ©. 


The sequence Y, = max Xz, of nonnegative simple functions is non- 
ksn 


decreasing, and 


Xin S¥nS Xu [Xin 5 [Yn 5 | %. 
It follows, by letting 7 — ©, that 
X, Slim Yn SX, [Xs flim, slim [X, 
and, by letting k — ©, we obtain 


X<limY, <X, lim f Xn s flim ¥, < lim [ X,, 


Thus lim Y, = X and f X = lim i] Xn. The assertion is proved. 


Corotiary 1. The integral is o-additive on the family of nonnegative 
measurable functions. 


This means that, if the X, are nonnegative, then f >» Xn= >> f Xn: 
and follows by 0 S De > Xn. 

Coro.iary 2. x is integrable, then J X| > Oas nd > 0. 
For, if X, = X orn according as | X| < zor| X| = n, then f Xn| ft 
f | X|, so that, given e > 0, there exists an 7 such that f | X| < 


f | Xnu| +5. Te follows that, for 4 with wd < ¢/2no 


Jl xl =f] Xl +f x1 =| ed <£+ fla] = fl tel <e 
A A A 2 
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The monotone convergence theorem extends as follows: 
B. Farou-LEBESGUE THEOREM. Let Y and Z be integrable functions. 
If Y S&S Xn or X, S Z, then 

frim inf X, S lim inf { Xn resp. lim sup { X, s flim sup Xn. 


IfY = X,7X,0orY S Xn S Zand Xp —, X, then | Xn — [x 
Proof. Ifthe X, are nonnegative, then 


X, 2 Yn = inf XT lim inf X,, 
k2n 


so that, by the monotone convergence theorem, 


lim inf f X, = lim f Y,= f lim inf Xp. 


The asserted inequalities follow, by the additivity property, upon ap- 
plying this result to the sequences X, — Y and Z — X,, of nonnega- 
tive measurable functions, and the asserted equalities are immediate 
consequences. 

Clearly, if the assumptions of this theorem hold only a.e., the con- 
clusions continue to hold. In fact, the last assertion, frequently called 
the dominated convergence theorem, extends as follows: 


C. DomINATED CONVERGENCE THEOREM. Jf | X,| S Y ae. with Y 


integrable and if Xn —> X or Xn —> X, then f Xn, f X. In fact, 
J Xn J X — O uniformly in A or, equivalently, fi Xn — X|— 0. 
Proof. Since 
f  — 20) sf] — x)= fO%— t+ fA — I, 


it follows that the last two assertions are equivalent and imply the 


first one. Thus, it suffices to prove that {| X,—-X|—> 0. Set 


Y, = | Xn — X | and observe that Y, S 2Y a.e. and that the f Yn 


remain the same when the Y, are modified on null events. Therefore, 
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it suffices to prove that, if 0 S$ Y, S Z integrable and Y, ="> 0 or 
Yn > 0, then [¥ — 0. 

The case Yq —> 0 follows from the last assertion in B. It implies 
the case Yn - 0, since, by selecting a subsequence Y,) (=> 0) such 


that [¥% — lim sup | Yn and, within this subsequence, a sequence 
Yn =" 0, it follows that [Yon — 0 and lim sup [ Yn = Q. Hence 


f Y, — 0, and the proof is complete. 


Extension. In all the preceding convergence theorems the parameter 
nm —» can be replaced by a parameter ¢ — 4 along an arbitrary set 
T C R of values, the reason for this being that 24, — aast — 4% along T 
is equivalent to 4, — a for every sequence ¢, in T converging to 4p. 

Applications 1. We assume all functions X; to be integrable. 

The dominated convergence theorem yields at once 


1° If | X | < Y integrable and X, > X, as t > tot CT), then 


[x + [%. 


This proposition yields, by applying the definition of derivative, 
Xt _ X to 


t — to 


(fs) (2), 


In turn, this proposition yields 


< Y integrable, then 


aX, 
2° If, on T, os exists at ty and 


aX , 
—— |S Y integrable, 
at 


i, aX, 
3° If, on a finite interval [a, 4], -s exists and 


d ax: 
— | Xx, -{—. 
at at 


X1—-— X (¢ — t’) (=) 
t = dt Jin 


then, on [a, 4), 


This follows from 


where ¢”” lies between f¢ and ¢’. And in its turn, this proposition yields 
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4° Tf, on a finite interval [a, 6], X; is continuous and | X; | < Y 4n- 
tegrable then, for every ¢ € [a, 5], 


J. (fr)ar~f([%a") 


Moreover, if the foregoing assumptions hold for every finite interval and 


+00 


f | X,| dt S Z integrable, then 


[(f=)a-f(f 4) 


The integrals with respect to ¢ are Riemann integrals. 
The first assertion follows from the fact that the derivative of a 


t 
Riemann integral { g(t) dt where g is continuous is g(t) which is bounded 


on [a, 4], so that, upon applying 3° to the asserted equality, it follows 
that derivatives of both sides are equal and, since both sides vanish 
for ¢ = a, the equality is proved. The second assertion follows by 1° 
from the first one, by letting 2 -~ —« andt¢ — +0, 

Il. Integrals over the Borel line. Let @® be the Borel field in R = 
(—o, +0) and let w be a measure on @ which assigns finite values to 
jinite intervals. Let @,y be the class of all sets which are unions of a 
Borel set and a subset of a p-null Borel set. @, is closed under forma- 
tion of complements and countable unions and, hence, is a o-field. By 
assigning to every set of ®, the measure of the Borel set from which it 
differs by a subset of a yw-null set, uw is extended to a o-finite measure 
on ®,, that we continue to denote by pp. @, will be called a Lebesgue- 
Stieltjes field in Rand pon @, will be called a Lebesgue-Stieltjes measure. 
The relation 


F(6) — F(a) = Fla, 4) = ula, 4) 


determines, up to an additive constant, a function F on R which is 
clearly finite, nondecreasing, and continuous from the left, called a dis- 
tribution function corresponding to p. (It was proved that, conversely, 
such a function determines a Lebesgue-Stieltjes measure uy.) 

Let g be a @,-measurable function. If ¢ is integrable, the integral 


i gdp is called a Lebesgue-Stieltjes integral. If F is a distribution func- 


tion corresponding to yp, this integral is also denoted by f g dF, and the 
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b 
integral f gdu is also denoted by [ gdadF. If F(x) =«, « CR, 
[a, b) a 


the corresponding measure is called the Lebesgue measure; it assigns to 
every interval its “length” and, thus, is a direct extension of the notion 
of length. The corresponding o-field, or Lebesgue field, is formed by 


b 
Lebesgue sets and the corresponding integrals, say f g dx, f gax, are 


called Lebesgue integrals. Lebesgue field, measure, and integral are 
prototypes of general o-fields, measures, and integrals. One may say 
that the basic ideas and methods relative to measure spaces and inte- 
grals belong to Lebesgue. 


b 
Let g be continuous on [a, b]. The Lebesgue-Stieltjes integral i) gdak 


b 
becomes then a Riemann-Stieltjes integral and the Lebesgue integral i gdx 


becomes then a Riemann integral, 
The proof is easy. We have to show that, g being continuous on 
b 
[2, 4], i) gdF is limit of Riemann-Stieltjes sums. This is possible be- 
a 


cause a continuous function on a closed interval is bounded and 1s 
the (uniform) limit of any sequence of step-functions 


, . _ 
§n = Dex nk)L [entstn sk 1)) a= Knit < < Mn kent ~ b, 
+ / 
Wink s X nk < Wn, k-+1) 


such that max (¥p,441 — *nz) — 0. Therefore, by the dominated con- 
kn 


vergence theorem or, more specifically, by the last assertion of the 
Fatou-Lebesgue theorem, 


b b kn | 
i) gdp = lim f &n dp = lim >, £(% nk) MLM nky Xn k+l)» 


k=l 
that is, 


b kn 
i) gaF = lim >» EX nL lnk Xnik+l)s 


k=1 


where the right-hand side sums are precisely the usual Riemann- 
Stieltjes sums. Thus, in the case of g continuous on [a, 4], the integral 


b 
i) g dF can be defined directly in terms of F, or of measures assigned 


to intervals only. 
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However, when g is continuous on R, its Lebesgue-Stieltjes integral 
over R and its improper Riemann-Stieltjes integral do not necessarily 
coincide. In fact, the last integral is defined by 


b 
[sar = lim g dF, 


a> —-axdJ_ 


b— +0 


provided the limit exists and is finite. It may happen that at the same 


time : 
fislar- lim f lela 


is infinite so that | g| not being Lebesgue-Stieltjes integrable, g is not 
Lebesgue-Stieltjes integrable. Such examples are familiar; one of the 
most classical ones is that of the improper Riemann-integral of g(*) = 
sin «/*. However, if g is Lebesgue-Stieltjes integrable then, clearly, 
both integrals coincide. Thus, the class of continuous functions whose 
improper Riemann-Stieltjes integrals with respect to a distribution 
function F exist (and are finite) contains the class of continuous func- 
tions which are Lebesgue-Stieltjes integrable with respect to F. 


§ 8. INDEFINITE INTEGRALS); ITERATED INTEGRALS 


8.1 Indefinite integrals and Lebesgue decomposition. We charac- 
terize now the indefinite integrals by using repeatedly the monotone 
convergence theorem. Let X be a measurable function whose integral 


exists—say, f X7 is finite. Then the indefinite integral » on @ de- 


fined by 
g(A) =| x= [x1 
A 


exists, for f X Iz, is finite and f XtI4 exists. Since the integral of a 


function which vanishes a.e. is 0, the indefinite integral is p-continuous, 
that is, vanishes for u-null sets. Since for a countable measurable par- 
tition {.4;}, X*#I4 =  X*I44,, it follows that, by the monotone con- 


vergence theorem, 
x=5f[ x 
Ios, 2. Aj; 


and the indefinite integral is o-additive. 
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If X is integrable, then it is a.e. finite, and the indefinite integral is 
finite. If X is not integrable but, still, X is a.e. finite and yu is o-finite, 
then the indefinite integral is o-finite. For, by decomposing Q into sets 
Arn of finite measure, we have 


io.) 


fx-5 x 


m=—o n=1 VY Anlm SX <m+1] 
and every term of the double sum is finite. 

The problem which arises is whether the foregoing properties charac- 
terize indefinite integrals and the answer lies in the celebrated Lebesgue 
(-Radon-Nikodym) decomposition theorem that we shall establish 
now. But first we introduce a notion in opposition to that of u-conti- 
nuity. A set function g, on @ is said to be u-singular if it vanishes out- 
side a y-null set; in symbols, there is a w-null set N such that 


g(AN*) =0, 4E@. 


A. LEBESGUE DECOMPOSITION THEOREM. Jf, on GQ, the measure yp and 
the o-additive function p are o-finite, then there exists one, and only one, 
decomposition of » into a p-continuous and o-additive set function op, and 
a p-singular and o-additive set function os, 


P = Ge t sy 


and , 15 the indefinite integral of a finite measurable function X deter- 
mined up to a w-equivalence. 


¢%- and g, are called u-continuous and u-singular parts of y, and X is 
called the derivative dp/dy with respect to uw; we emphasize that dy/du 
is determined up to p-equivalence. 

Proof. 1° Since Q is a countable sum of sets for which » and 
are finite and since, by the Hahn decomposition theorem, ¢ is a differ- 
ence of two measures, it suffices to prove the theorem for finite measures 
wand yg. Furthermore, if there are two decompositions of ¢ into a 
u-continuous and a yp-singular part: 


C=O tos=Pct Y's 
then 
Yo—- Vc = Fs — os = 9, 


for the u-continuous function gy, — ¢’, vanishes for all p-null sets while 
the w-singular function yg’, — gs vanishes outside a u-null set. Finally, 
an indefinite integral determines the integrand up to an equivalence: 
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ed) =| X= f x 


then X = X’ a.e.; for, if, say, ud = wLX — X’ > €] >0, then 


if, for every 7 € @, 


[. (X — X’) > 0. 


Thus the uniqueness assertions hold if we prove the existence asser- 
tions under the assumption that uw and ¢ are finite measures. 

2° Let & be the class of all nonnegative integrable functions X whose 
indefinite integrals are majorized by ¢: 


[Xs0lM, 4€4 
A 


@ is not empty, since X = 0 belongs to it; and there 1s a sequence 
{Xn} C ® such that 


[> sup [X= 259 <a 
Xx€@ 


sup X;, so thatO S X’, 7 X = sup Xp. Let 


k<n 


Ay =([Xy = Xn), A'p = M19 +++ Ap Any A = Ah, 


Let X’n 


so that n n 

~ 4, =U A =2 
k=1 k=1 

and, for every 4, 


[ X= x X', -> ce > (4d) = o(A). 
A AA’, k=1 


Upon letting 2 — © and applying the monotone convergence theorem, 


we get 
[xx o(A), [x= a. 
A 


Therefore X is a ‘‘maximal” element of ©. This property will allow us 
to show that 


~=9—-—% 2), 


where ¢, is the indefinite integral of X, is u-singular, and the proof will 
be complete. 
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3° Let Dy + Dn° be a Hahn decomposition for the finite and o-addi- 
] 

tive set function g, = gs — — yu, that is, gn(4D,) S Oand ¢,(4D,°) 2 0 
n 


for every 4. Let D = () Dy (whence D° = UL D,°), so that, for every 
A and all n, 


< (4D) < ~u(AD). 
nN 


Upon letting 7 — », it follows that 9,(4D) = 0 and, hence, 9;(4) = 
ys(AD*). Since 


e(4) — o( A) _ gs(AD°) Ss v(4) — vs(AD,"), 


it follows that 


1 
f(x t ~Lo.) = ld) + ~ u( ADs °) < oA) — on(ADn’) ¥ 9A), 


1 
so that X+—-J,.€ &. But this conclusion is contradicted by 
n nr 


| 1 1 
[(%4 ato.) = + aude > 
nm” n 


unless uD,° = 0. Therefore, all sets D,° are u-null sets and so is 
their countable union D*®. Since ¢,(4) = 9;(4D°), it follows that 9, 
is w-singular, and the proof is complete. 

In the particular case of a y-continuous ¢, the foregoing theorem 
reduces to 


B. Rapon-NIKODYM THEOREM. If, 07 Q, the measure up and the o-addt- 
tive set function ~ are o-finite and ¢ 1s w-continuous, then ¢ is the indefinite 
integral of a finite function determined up to an equivalence. 


We are now in a position to characterize indefinite integrals of finite 
functions on a o-finite measure space. 


C. A set function ¢ on @ is the indefinite integral on a o-finite measure 
space of a finite function X determined up to an equivalence, if, and only 
if, p is o-finite, o-additive, and y-continuous; and X 1s integrable if, and 
only if, this o is finite. 


The “if? assertion is the Radon-Nikodym theorem and the “only if” 
assertion is contained in the discussion at the beginning of this sub- 
section. 


134 MEASURABLE FUNCTIONS AND INTEGRATION [Sec. 8] 


CorotiarRy. Leth and p be o-finite measures on Q. If wis d-continuous 


and X is a measurable function whose integral f X dy exists, then, for every 


AER, 
ay 
[ xau=f x =a 
A A an 


Proof. It X = Ip, B € @, then the equality is valid, since 
du du 
J toa = pAB = ae =| ba 
A ar 


It follows that the equality is valid for nonnegative simple functions 
and hence, by the monotone convergence theorem, for nonnegative 
measurable functions and, consequently, for measurable functions 
whose integral exists. 

Extension. The indefinite integral of a measurable function X which 
is not necessarily finite is still c-additive and y-continuous, but it is not 
necessarily o-finite. The question arises whether the Radon-Nikodym 
theorem can be extended to this case. The answer is in the affirmative. 


D. The Radon-Nikodym theorem remains valid if finiteness of X and 
o-finiteness of p are simultaneously suppressed therein. 


Proof. As usual, it suffices to consider a finite measure p and a 
p-continuous measure ¢ on @. 

Let ® be the class of all measurable sets such that ¢ on @ is o-finite, 
and let s be the supremum of » on B&. 

There exists a sequence B, © @ such that s = lim wB, and, hence, 
B=U8,€8B with pwR=s. If there exists a CE {B°4, AE @} 
such that 0 < g(C) < , then B+ CE ®, uC > 0,7 and 


>u(B+cCc)=bwRB+uC>s. 
Therefore, while y on {B4, 4 € @} is o-finite, g on {B°A, AE @} 


can take values 0 and ~ only. 

Furthermore, whatever be C € {B°4, 4 € @}, it is impossible to have 
uC > 0 and g(C) = 0 since then B+ C€B and, as above, s > s. 
Since ¢ is u-continuous, it is also impossible to have nC = 0 and ¢(C) 
> 0. Thus, for every C € {B°A, 4 € G}, either uC > 0 and ¢g(C) = 
o-uC =o or uC =0 and g(C) =0. In other words, ¢ on {B°A, 
A € @} is the indefinite integral of a function X = © on B’, deter- 
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mined up to an equivalence. On the other hand, by B, ¢ on {B4Z, 
A €& @} is the indefinite integral of a function X on B, determined up 
to an equivalence. These values of X on B and on B° determine it on 
Q, up to an equivalence and, for every 4 € Q@, 


[xn=Jf x4] X= (4B) + (4B) = oA) 


The extension follows. 

8.2 Product measures and iterated integrals. Let (Q;, @:, u;), 7 = 
1, 2, be two measure spaces. A space (Q, @, pu) is their product-measure 
space if 


Q = Qy X Qe is the space of all points w = (w 1, we), w; & 33 

@ = Q@; X @e is the minimal o-field over the class of all measurable 
“rectangles” 4, X do, 4; € @;, where 4, X Ae is the set of all 
points w with w; € 4; 

= 1 X me is the “product-measure” on @, provided it exists, that 
is, is a measure on @ uniquely determined by the relations p(4; X 
M49) = 141 X weds for all measurable rectangles 4; X Ap. 


We intend to find conditions under which the product-measure ex- 
ists and conditions under which integrals with respect to this measure 
can be expressed in terms of integrals with respect to the factor meas- 
ures y;. In what follows the subscripts 1 and 2 can be interchanged. 
We shall also frequently proceed to the usual abuse of notation which 
consists in the use of the same symbol for a function and for its values. 

For every set 4 CQ, the section A,,, of A at w 1s the set of all points 
we such that (w;, w2.) € A. For every function X on Q, the section X.,, 
of X at w, is the function defined on Q2 by X,,,(we) = X(w1, we). 


a. Every section of a measurable set or function is measurable. 


If @ is the class of all the sets in Q whose every section is measurable, 
then it is readily seen that @ is a o-field. But every section of a meas- 
urable rectangle 4; X Az is measurable, since it is either empty or 1s 
one of the sides. Therefore, C D @ and the first assertion is proved. 
If X on Q is measurable and S C R is an arbitrary Borel set, the sec- 
ond assertion follows by 


Xu, (S) = [we; X.,(we) € S] = we; X(w1, w2) € S§] 
= [we; (wi, we) © X7*(8)] = (X71(8)) au. 
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A. PRODUCT-MEASURE THEOREM. If yy 0” Q, and po on Qy are o-finite, 
then, for every A © Qi X Qe, the functions with values p44, and po.Aw, 
are measurable, and the set function p with values 


pA = f (u14,) dug = f (ud) dur, 


is @ o-finite measure pon Gy X Qo uniquely determined by the relation 


way X de) = m1 X pwede, A; CQ. 


In other words, u is the product-measure py X po. 

Proof. The proof is based upon the fact that, by the monotone 
convergence theorem, the class 91 of all those sets 4 for which the in- 
tegrals are equal is closed under formation of countable sums. 

Since the measures pw, and we are o-finite, the product space is de- 
composable into a countable sum of rectangles with sides of finite meas- 
ure. It follows that, without restricting the generality, we can suppose 
that these measures are finite. If 4 = 4, & de is a measurable rec- 
tangle, then 41; 4, = w141 X I4,(we) and similarly by interchanging the 
subscripts 1 and 2. Thus, the functions with these values are measur- 
able and both integrals reduce to uy4, X pwede. The last asserted equal- 
ity is proved and SW contains all measurable rectangles. It follows that 
J contains the field of finite sums of these rectangles. But, 91 is closed 
under nondecreasing passages to the limit, on account of the monotone 
convergence theorem, and, under nonincreasing ones, on account of the 
dominated convergence theorem and the finiteness of measures. There- 
fore, by 1.6, it contains the product o-field @; X @g, and the equality 
of the integrals is proved. The finite set function » on @ so defined is a 
measure, on account of the monotone convergence theorem, and it 1s 
uniquely determined by the stated relation, on account of the exten- 
sion theorem. This terminates the proof. 


Coro._iary. 4 € Q; X Qe 15 a (uy XK me)-null set if, and only tf, al- 
most every section Ay, 18 @ po-null set. 


For the integral of a nonnegative function vanishes if, and only if, the 
integrand vanishes a.e. 

We are now in a position to answer the second stated question. The 
result is due to Lebesgue and Fubini and is generally called the Fustni 
THEOREM. 
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B. ITERATED INTEGRALS THEOREM. Let (Q1, Qi, m1) and (Qe, Qo, pe) 
be o-finite measure spaces. 

If the Qy X Qo-measurable function X on Qy X Qe is nonnegative or 
by X we-integrable, then 


Xd(ur X pe) =| df Xo, aps =| uf X wy Ay 
21 XQ 21 Qe Qe 01 


and in the integrability case almost every section of X is integrable. 


The iterated integrals are to be read from right to left. 

Proof. For X = TJI,, the asserted equality reduces to that of the 
product-measure theorem. It follows that it holds for simple functions 
and hence holds for nonnegative measurable functions because of the 
monotone convergence theorem, since, if0 < X, T X, thenO S (X,)., 7 


(X),,,. If X 2 0 is integrable, then the function { X., dug of wy is in- 


tegrable and hence a.e. finite, so that the functions X,, of we are almost all 
integrable. Therefore, if X = X* — X7 is integrable, that is, Xt 
and X~ are integrable, then (X)o, = (X*)u, — (X7)u, are almost all 
integrable and a.e. finite. This terminates the proof. 

Finite-dimensional case. What precedes extends in an obvious man- 
ner to the product of an arbitrary but finite number of measure spaces. 
The interesting case is the infinitely dimensional one, and we shall now 
investigate it from a somewhat more general point of view. 

*8.3 Iterated integrals and infinite product spaces. In what follows 
we push the abuse of notation to its extreme. 

We consider a sequence of measurable spaces (Qn, @,) and denote 
by wn points of 2, and by 4, measurable sets in Q, (sets of @,). The 
product measurable space (Q1 X-++-X Qn, @1 X++:X Gp) is the space 
of points (wi, -++, wn) together with the minimal o-field over the inier- 
vals A, X+++X An. The product measurable space ({[] Q,, [] @n) is 
the space of points (1, we, ---) and the minimal o-field over all cylinders 


of the form 4; X-::X An X J] Q, or, equivalently, over all cylinders 
k=n+1 


of the form C(B,) = Bn, X JI where the dase By is a measurable 
k=n+1 


set in Qy X++: XK Qn. 

In the infinitely dimensional case, we must, for reasons of “‘consist- 
ency’”’ (to be made clear later), limit ourselves to probabilities, that is, 
to measures which assign value one to the space, to be denoted by P, Q, 

++, with or without affixes. Furthermore, in probability theory, the 
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following more general concept plays a basic role (at least when “‘inde- 
pendence”—see Part IIIJ—is not assumed). Every function—to be de- 
noted by Pn(wi, +++; ®n—13 4n)—which is a probability in 4, for every 
fixed point (wy, -+*, @n—1) and a measurable function in this point for 
every fixed 4, will be called a regular conditional probability. For 
n = 1 it reduces to a probability P, on @; but for 7 > 1 it reduces to a 
probability on @, only when it is constant in (w1, --+*, wz—1) for every 
fixed 4,, provided the ordered T has a first element. We observe that 
the functions pe44, = ue(@1; 4o,) are regular conditional probabilities 
when p2(w1; 22) = 1. On account of the monotone convergence theorem, 
iterated integrals of the form 


OnBn = | Pr(der) f Pater; de) oe 
Py(w1, 8) Wn13 dwn) Ip, (1, res, Wn) 


define probabilities Q, on @1 X--*X Gn. It follows by the same theo- 
rem that if a measurable function X on Q) X---X Q, is nonnegative 
or Q,-integrable, then 


fo. XaOn = [ Palder) f Palors den) + 
MXKe++ KA, 
f Palos ee _) WOn—15 dun) X (4, ane | Wn). 


A. ITERATED REGULAR CONDITIONAL PROBABILITIES THEOREM. The 
iterated integrals 


OC(B,) = { Pr(dor) f Palo dug) ++: 
f Palos 9°) Wn—13 dun) Ip (1, cry Wn), 


determine a probability Q on [J Qn. 
This extension of the product-probability theorem is due to Tulcea and, 
proceeding as therein (in 1°), permits one to determine Q on an arbitrary 


I] @: under obvious consistency conditions on the regular conditional 
iET 


pr.’s Pings (Ons "8 "3 Wins Any) 

Proof. To begin with, the definition of Q, on the class © of all cyl- 
inders of the form C(B,) is consistent. For, if C(B,) = C(Bn), m <n, 
then integrations with respect to the w, which do not belong to the 
product subspace where B,, lies yield factors one. 

Since Q on @ is finitely additive, the assertion will follow by the ex- 
tension theorem if we prove that Q on @ 1s continuous at §. We have 
to consider nonincreasing sequences of cylinders which converge to @. 


Upon renumbering the indices, we can suppose that the sequences are 
of the form C(B,) | @ with nonempty bases B, € @; X:+-X Gyn. We 
can write 


(1) OC(By) = f P(d)QMC(Bn) a 


where (B,).,, 1s the section of B, at w, and 


OMC(Br)an = f Po(wo1; diag) +> i) Pr(wo1y ++ +5 m1; dedn) Zp, (01, ** +5 Op). 


In (1) the left-hand side is nonincreasing in ”, and the integrand con- 
verges nonincreasingly to a certain limit Xy(w,) 2 0. By the dominated 


convergence theorem, the limit of the left-hand side is { P (dw) X1(w}). 


Assume that this integral is positive. Then there exists a point a; 
such that X1(@,) > 0. It follows that we find ourselves in the same 
situation but with the sequence Q™C(Bn)z, instead of OC(B,). Re- 
peating the argument over and over again, we obtain a sequence @ = 
(1, @2, -*:) such that & <Q, and O™C(Br)s,,....0, 4 Xn(@n) > 0. 
Therefore, every C(B,) contains at least one point of the form (a, ---, 


ny On4i, °**). Since C(Bn) = Bn X JI %, it contains the point & 
k=n+1 


and, hence, wo € {) C(B,). Thus, when QC(B,) 4 0 the intersection 
is not empty, and the theorem follows ad contrario. 

Particular cases. 1° If Py(ay, +++, @n—13 4n) = PnAn are constant 
for every fixed 4,, then we write Q = [[ Pn and call it a product- 
probability. Then the theorem reduces to the product-probability theo- 
rem in the denumerable case (4.2A). 

2° If the factor spaces are finite-dimensional Borel spaces, then, it 
follows from 27.2, Application 1, that the theorem yields the consistency 
theorem. 


COMPLEMENTS AND DETAILS 


Notation. Unless otherwise stated, the measure space (2, @, yu) is fixed, the 
(measurable) sets 4, B, ---, with or without affixes, belong to @, and the func- 
tions X, Y, ---, with or without affixes, are finite measurable functions. 

1. The set C of convergence of a sequence X, (to a finite or infinite limit 
function) is measurable. 


(C = [lim inf X, = lim sup X,].) 
2. If pw is finite, then given X, for every € > 0 there exists 4 such that 


pA <eand Xis bounded on 4°. If X is bounded, then there exists a sequence of 
simple functions which converges uniformly to X. Combine both propositions. 
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We say that a sequence X, converges almost uniformly (a.u.) to X, and write 
Xn —; X, if, for every ¢ > 0, there exists a set 4 with ud <e such that 
Xn —> Xon #. 

3. If Xn =“, X, then X, =" 5 X and Xn > X. (For the first assertion, 


form 4, where 4, is the 4 of the foregoing definition withe = ) 


4. If X, — X, then there exists a subsequence X,, —", x, 
5. Egoroff’s theorem. If wis finite, then X, ——> X.implies that X, ——> X. 
Compare with 3. (Neglect the null set of divergence, and form 4 = U 4,, 
m=1 


1 
with 4, = U | |x,-xX|2— ] and n(m) such that udin < _ 
k=n(m) m 2™ 
6. Lusin’s theorem. If wiso-finite, then Xn "5 xX implies that Xn —> Xon 
every element 4; of some countable partition of 2-N where N is some null set. 
(Neglect the null set of divergence, and start with wu finite. Use Egoroff’s 


; n 1 
theorem to select inductively sets 4% such that p a) Ap < 7 and X, —> Xon 
A,° for every k.) 


. a.e. . . . te 
7. If wis finite, then X, —--> X implies existence of a set of positive measure 
on which the X, are uniformly bounded. What if yu is o-finite? 


8. Ifpis finite, then Xmn —~> X,asn — cand Xn —> Xasm —> © imply 


° a.e. 
that there exists subsequences mx, 7, such that Xmyn, ——> X ask — 0. What 
if u is o-finite? 


(Neglect the null sets of divergence. Select 4, and m, such that w/%;, <5 


1 
and | Xm — X|< 7B On A,*. Select B,C 4, and mp, such that pB, <5 
1 
and | Xmnz — Xm, | < 5 ON Ay — Br.) 


9. Let X, > X, Y,— Y. Do aX, + bY, > aX + bY, | X,|5|X|, X2—> 


X?, XnYn—> XV? What about 1/X,? Let pw be finite and let g on R or on 
R X R be continuous. What about the sequences g(Xn) and g(Xn, Yn)? 

0. Let the functions Xn, X on the measure space be complex-valued or 
vector-valued or, more generally, let them take their values in some fixed 
Banach space. Denote the norm of X by | X|, and denote |X, — X| — 0 by 
Xn — X. 

Transpose the constructive definitions of measurability and the definitions of 
various types of convergence. Investigate the validity of the transposed of the 
corresponding properties established in the text, as well as of those stated above. 

Il, Examples and counterexamples of mutual implications of types of con- 
vergence. Investigate convergences of the sequences defined below: 

(1) The measure space is the Borel line with Lebesgue measure, X, = 1 on 
[7,2 + 1] and X, = 0 elsewhere. 
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(ii) The measure space is the Borel interval (0,1) with Lebesgue measure, 
Xn = lon (0, -\ and X,, = O elsewhere. 


(111) The measure space is the Borel interval [0, 1] with Lebesgue measure, 
the sequence iS Au, X21, X22, X31, X32, X33, >>> with Xnk = lon E — : ) 4 


n 


and Xnz = O elsewhere. 


(iv) @ consists of all subsets of the set of positive integers, u/4 is the number 


of points of 4, Xz is indicator of the set of the ” first integers. 
12. If X is integrable, then the set LX ¥ O] is of o-finite measure. What if 


, 1 
[X exists? wll X| 2d s= [|X] 


13. Let (T, 3, r).be a measure space, to every point ¢ of which is assigned a 
measure wz on @. Let the function on T defined by u:4 for any fixed 4 be 
J-measurable. 


The relation pd = f uA dr(t) defines a measure p on @. If {, X(w) du(w) 
T 
exists, then the function defined on T by U(t) = [, X(w) duiw) exists and is 
3-measurable, and f X(w) du(w) = f U(t) dr(t). 
2 T 
14, Let ¢ be the indefinite integral of X. Express ot, o~, @ in terms of X. 
15. If [% — 0 uniformly in 2 as ud — 0 or as 4 |G, then the same is 
A 


true of J | X,, |; and conversely. Interpret in terms of signed measures. 
A 


(f Xn | = J rseso® — firsecoi®™) 


16. If finite f Xn f X finite, uniformly in_.4(€ @), then f \X,—-X| 3 
A A Q 
0; and conversely. 
B ° e e e 
17. lf0 SX, — X, then finite [ X — {x finite implies that [x — 
Q Q A 


[x uniformly in 74 (also if —> is replaced by 5) 
A 
(0 < (X — X,)+ € X integrable, and J (X — X,)+ — f (X — X,) > 0) 


18. Rewrite in terms of integrals as many as possible of the complements and 
details of Chapter I. 

19. If the X, are integrable and lim [x exists and is finite for every 7, 

A 
then the f] X,| are uniformly bounded, f |X| — 0 uniformly in 7 as 
A 
ud — 0 and as 4 |G, and there exists an integrable X, determined up to an 
equivalence, such that f Xn > f X for every 4. (Use J8.) 
A A 
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20. Ifintegrable X, —> X integrable, then existence and finiteness of lim f Xn 
A 


for every 4 are equivalent to the following properties: 
Xn X uniformly in 4; 
(1) J _ J uniformly in 
(i1) [ Xn — Ouniformly inn asud — Oandas 4| 9. 
A 


If uw is finite, then “as 4 | 6” can be suppressed. (Use the preceding proposi- 
tions and the relations 


fl%lsflx-— x1 +f 1x1. 


f\%- Xi set (| X,)+)X). 


A[|Xn—X| 24 
21. The differential formalism applies to Radon-Nikodym derivatives: 
Let us, v be finite measures on @ and ¢, ¢’ be o-finite signed measures on @. 
Let ¢ be v-continuous and v, y, y’ be w-continuous. Then 


dete) _de de 
dy du’ du * 
de _ded 
dus dv du ue 
(For the second assertion, it suffices to consider p20, X = 2 > 0 


Y= 2 =0. Take simple X, with O S$ X, 7 X so that 
[n 


[xe — | Xna = | Xn¥ du > f XY du.) 


Let {uz,¢€ T} and {y’v, t’ © T’} be two families of measures on Q; we 
drop ¢ © T and ¢’ € T’ unless confusion is possible. We say that {yu} is 
{u’+}-continuous if every set null for all w’y is null for all uw: If the converse 
is also true, we say that the two families are mutually continuous. 

22. If {u;} is a countable family of finite measures, then there exists a 
finite measure pw such that {yu} and w are mutually continuous. (Take p = 
De Mj/2ip;Q.) 

23. Let the uw; and pu be finite measures. If {y;} is u-continuous, then there 
exists a finite measure p’ such that {u;} and yw’ are mutually continuous. (Select 

de 

au 
some ¢, B C A; and wB>O0. Denote countable sums of sets B up to p-null 
sets by C, with or without affixes. Every subset C’ C C with y:C’ > 0 is a set 
C; every countable union of sets C is a set C. Let uC, — s where s is the 
supremum of values of u over all the sets C. Thens =p UC, =u U Bm and 
to every m there corresponds a pz, Say Mm, such that By, C Am and UmBm > 0. 
The families {u,} and {um} are mutually continuous.) 


sets 4; = > 0|. Denote by B, with or without affixes, sets such that, for 


[Sec. 8] MEASURABLE FUNCTIONS AND INTEGRATION 143 


n 


n 

24, Let fin = >> ue — @ and 7, = >> % — DB, all the w and v with various 
k=1 k=l 

affixes being finite measures on @ and every 7, being f,-continuous. 


vey . , dja aii 
(ii) if {u,} is y-continuous, then che, ae. 
dv dv 


dn, a 
din ap 
(For the last assertion, if in4n = 0 for all n, then @ (lim sup 4,) = 0. It fol- 


(111) ¥ is Z-continuous and 


. , ; ; av n n 
lows that it suffices to consider a particular choice of the 7 = >> X:/>> Y: 
n k=1 k=l 


fy 
where X; = da Y; = ab . But >> xX, = a and >) Yn = lji-a.e.) 
afi a di 

The propositions which follow correspond to various definitions of the concept 
of integration. We shall assume that the measures and the functions are finite. 
Besides proving the statements, the reader should also examine removal of the 
restriction of finiteness as well as of other restrictions which may be introduced. 


25, Set 
[Xap =[Xdet -{Xde-, [(X+i¥) du ={Xdu+ if Y du, 


f Xd + iv) =[Xau+ifxXa 


and investigate existence and properties of integrals so defiried. 

26. Descriptive approach. The Radon-Nikodym theorem characterizes an in- 
definite integral but not that of a given function. The following proposition 
answers this requirement. 

y on G is indefinite integral of X on Q if, and only if, g is o-additive and, for 
every set d4=[¢5 X S4|B, BEQ, 


and S oA) S bpd. 


27. In the definition of the integral given in the text, start with (nonnega- 
tive) elementary functions instead of simple ones. The integral so defined coin- 
cides with the initial one. 

28. Lebesgue’s approach. The Cauchy-Riemann approach starts with arbi- 
trary finite partitions of the interval of integration into intervals. The Lebesgue 
approach consists in partitioning the set of integration according to the function 
to be integrated so that the integral is tailored to order as opposed to the ready- 
to-wear Cauchy-Riemann one. Let p < o. 


Set 
o k— |] k—1] k 


If X is bounded, these sums correspond to finite partitions and f X= 
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lim >> ,(X). If X is not bounded, set Xmn = X if —m SX Snand Xnn = 0 
otherwise. If X is integrable, then f Xmn — f X asm,n — ©, 

If X is not bounded, the series >>,(X) correspond to countable partitions 
and f X = lim >°,(X), in the sense that if X is integrable, then these series 


are absolutely convergent and the equality holds and, conversely, if one of these 
series is absolutely convergent, so are all of them and the equality holds. 
(For the last assertion, it suffices to consider nonnegative elementary func- 


ok— 1 

tions X, = >, aa I E cyek ] For the converse, use the relation 
1 a an 

X $ 2X, + uO) mS 


29, Darboux-Young approach. Let X be measurable or not and set 


k=1w 


J X = sup p> inf X@)u Ar, {x = inf >> sup Xw) udp 


where the extrema of sums are taken over all finite measurable partitions 


> A, = Q. If X is measurable and bounded, then 
k=l 
[x=fx- x. 


If f X and f X exist and are equal, we say that f X exists and equals their com- 


mon value. 


We can also set _ 
[x = sup [Y, [x= int fz 


where the extrema are taken over all integrable (and measurable) Y and Z such 
that Y S X S Z and define f X as above. Compare the two definitions. 


30. Completion approach. The Meray-Cantor method for completion of 
metric spaces adjoins to the given metric space elements which represent 
mutually convergent (in distance) sequences of its points. This method permits 
(Dunford) to define and study the integral of functions with values in an arbi- 
trary Banach space (Bochner), as follows: 

(i) Define the indefinite integral of a simple function as in the text. Since 
nonnegativity and infinite values may be meaningless, all simple functions 
under consideration are integrable. 

(ii) Adjoin to the space of these integrable functions Xm, Xn, °-- all functions 


X such that fi Xm — Xn| — 0 and X, — X, by defining the indefinite inte- 


[SEc. 8] MEASURABLE FUNCTIONS AND INTEGRATION 145 


gral of X as the limit of the indefinite integrals of the X,. To justify this defini- 
tion, prove for simple functions those elementary properties of integrals which 


continue to have content for an arbitrary Banach space: f | Xn — Xn| — Oif, 


and only if, X, —>» X where X is some measurable function, and f | X,{ > 0 
A 
uniformly in 2 as pd — 0; fl Xm — Xn| — O implies that gn — yo where ¢ 


is o-additive. 

(iii) Extend the foregoing properties to all integrable functions and obtain 
the dominated convergence theorem. 

31. Kolmogorov’s approach. Let © be a class closed under intersections. Let 
D, with or without affixes, be finite disjoint subclasses of ©. Order them by the 
relation Dy; < De if every set of De is contained in some set of Dy. Fix 7 € C 
and consider all the D which are partitions of 4. They form a “direction” A 
in the sense that, if D; and De are such partitions, then there exists such a 
partition which “follows” both, namely, D; M De. 

Let ¢ on © be a function, additive or not, single-valued or not. By definition, 


OA) =f de = lim ¥ oA) 


where the 4; are elements of partitions D of 4 and the limit ¢(/), if it exists, is 
“along the direction A,” that is, to every « > O there corresponds a D, such that 
| (4) — } oA) | < for all D > D, and all values of the o(4,—if ¢ is 
multivalued. If ¢(4) exists, it is unique. If ¢ on © exists, then it is finitely 
additive. 

Compare this integral to the Riemann-Stieltjes integral by selecting con- 
veniently ¢. 


Compare f f° with the length (if it exists) of the arc a6 of a plane curve, by 


taking o(a,_1, a) = Q—10%, the length of the cord a,_1 to ax, the a = a, 
-** Qk—1, Ak, °° *, &n = B being consecutive points on the arc af. 

We say that g and g’ on @ are “differentially equivalent” on 7 if, for every 
e > 0, there exists a partition D, of 4 such that >> | (4, — ¢'(4;) | < ¢ for 


all D > D.. If ¢— is finitely additive, then [40 = (A). If not, then ¢ on 


A f) @ (if it exists) is the unique additive function differentially equivalent on 
A to ~. Proceed as follows: 

(i) g and g’ are differentially equivalent on 4 if, and only if, 6 = ¢. 

(11) go and ¢ are differentially equivalent on 4. 

(iii) If finitely additive functions g and g’ are differentially equivalent on 4, 
then they coincide on 4. 

In all which precedes replace ‘“‘finite’’ by “‘countable” and investigate the 
validity of the propositions so obtained. Compare the various definitions of 
the integral, by selecting conveniently ¢. 

Finally, take @ with values in a fixed but arbitrary Banach space, and go over 
what precedes. 

32. A structure of the concept of integration. The concept of integration is con- 
structed by means of the concepts of summations and of passage to the limit 
along a direction or, more generally, a cut-direction. A bipartition A = A+A 
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of a set A with an order relation < is a “‘cut-direction” if A and A are directions 
and every element of A follows every element of A. 

Let ~ be a function, single-valued or not, on a direction A to a real line or 
a plane or, more generally, a Banach space. The element ga of the range space 
is “limit of g along A” if, for every € > 0, there exists an a,€ A such that 
| ga — g(a) | < € for all a > a and for all values of g(a). If the direction A 
is replaced by a cut-direction A, then yg is “limit of g along A if, for every 
e > 0 there exist a € A and & € A such that | 93 — g(a) | < ¢ for all a such 
that a. < a < &, and for all values of g(a). If ga or vg exist, they are unique. 

To every a € A assign some finite collection of points a; of a Banach space, 
not necessarily distinct and not necessarily uniquely determined. Form g(a) = 
>, v(a;). By definition, Ri is the limit, if it exists, of g along A. If A is re- 


placed by A, the definition continues to apply. 
Investigate all definitions of the integral you know of from this structural 
point of view, that is, the selections of A or A, and of the functions g. 


33. Daniell approach. Let S be a family of bounded real-valued functions on Q, 
closed under finite linear combinations and lattice operations f LJ g = max (/, g), 


fig =min(f,g). Then fe L=> |f| =f/U0-/f)0€S. Suppose that 


on S is defined an integral | a nonnegative linear functional continuous under 


monotone limits: f= 0 => [f2 0, [ (af + 2g) = afftofatnl 0= [fn 10. 


a) Let U be the family of limits (not necessarily finite) of nondecreasing sequences 
in §. Ucontains § and is closed under addition, multiplication by nonnegative 


constants, and lattice operations. Extend the integral on U, setting | f = lim f fn 


when S 5 fnT/ (infinite values being permitted). 
The definition is justified, for if the nondecreasing sequences f/, and gn in S are 


such that lim f/f, S lim gn, then lim [fn < lim [ gn. 


If UDfatf thenf € Uandffat [7. 
b) Let —U be the family of functions f such that — f€ U, and set 


[f= -f(A. 


Ifg —U,hE Uandg Sh, thenh—g © U andfa—fe=[-pz0. 

By definition, f is integrable if, for every € > 0, there exist ge€_ —U and 
Ae €& Usuchthatg,. S$/fs he | & and { i. are finite, and fh - fx. <e. Then 
inf f Ae = sup | g< and [f is defined to be this common value. 


Let L be the family of integrable functions. Z and the integral on L have all 
the properties of S and of the integral on S. 


If LD fat f and lim [fn < », then f © Landffnt fy. 


Let § be the smallest monotone family over S (closed under monotone passages 
to the limit by sequences). & is closed under algebraic and lattice operations. 
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Let Li = L()S. f€ Lr if and only if f€ & and there exists g € Li such that 
< 


g. 
e) Let §*+ be the smallest monotone family over S* (consisting of all nonnegative 


functions of S$). Set [f = o if f€ Ft is not integrable. By definition, for 


fEG, f f= f ft f {7 exists if ft or f7 is integrable. 

If i f and J g exist and they are not infinite with opposite sign, then f +z) 
exists and equals I f+ f g. 

If [fn exist, fa > —o, andfnl f, then [/ exists and [fn1 ff 


f) If Z4 € §, then, by definition, the measure of 4 is ud = fla. 


If Ta, Ip€ &, then Ia us, IAnB, Ia—-p€. § and if the Ia, & 5, then Lyra, 
€ §F and ud An = Down. 

g) Suppose thatfE_S => ffV1eC S. ThenfEs > ff) 1C Fandifa> 0, 
then Lip>ay a S. 

If f = 0, Iya © § for every a > 0, then fC &. 


h) Suppose that 1 €_ §. Then f € §t=> [r = | fdu where the right side 1s 


taken in the customary sense. What if f€ &? 

i) The family S is a real linear normed space with the uniform norm || f || = 
sup /. Every bounded linear functional g(/) on this space is difference of two 
bounded nonnegative linear functionals y(/) = ot(f) — o~(f): Take ot (/) 
= sup f¢(/’),0 Sf’ Sf} on St, then extend to S by linearity. 

34, Riesz representation. Let X be a locally compact space with points x, 
compacts K, and the o-field S of topological Borel sets S, with or without sub- 
scripts. Let C be the space of bounded continuous functions g, with or without 
affixes, with the uniform norm || g || = sup g. Co C C consists of those g which 
vanish or infinity: Given ¢ > O there exists a K, such that | g| <e on K¢. 
Coo C C consists of those g which vanish off compacts and Cx C Coo of those g 
which vanish off K. If X is compact, then Ce = Coo = Co = C. 

a) Dini. If ga © Coo and galO, then galO uniformly, that is, || g |] 0. 

b) Nonnegative linear functionals u(g) on Coo are bounded on every Cx and are 
integrals on Coo: Bounded, since there exists go €. Coot with go 2 1 on Cx, 
hence g € Cx implies | g| S < go |l gz || and | u(g) | S u(go) || ¢ ||. Integrals, since 
gi © Cx and gnl0 imply gn © Cr, || g ||L0, hence | u(gn) | S u(go) || gn | LO. 

c) There is a one-to-one correspondence between nonnegative linear functionals 
u(g) on Coo and measures w(S) bounded on compacts, given by pu(g) = 


J uldx)g(x): By b) and 33, u(g) determines the measure p(S). 
d) There is a one-to-one correspondence between bounded linear functionals 
¢(g) on Coo and bounded signed measures ¢(S) on 8 given by ¢(g) = f o(dx)e(2) 


with |] ¢ | = Var y: Apply c) and 351). 
e) There is a one-to-one correspondence between bounded linear functionals on 
Co and bounded signed measures on 8. Compactify and apply d. 


Part Iwo 


GENERAL CONCEPTS AND TOOLS OF 
PROBABILITY ‘THEORY 


Probability concepts can be defined in terms of measure-theoretic 
concepts. Since probability is a normed measure and random variables 
are finite measurable functions, the properties of sequences of random 
variables are more precise than those of measurable functions on a 
general measure space. Since in probability theory probability spaces 
are but frames of reference for families of random variables, probability 
properties are to be expressed in terms of the laws of the families only. 
These laws are expressed in terms of distributions which are set func- 
tions on the Borel fields in the range spaces. The distributions are ex- 
pressed in terms of distribution functions which are point functions 
on the range spaces. In turn, to distribution functions correspond their 
Fourier-Stieltjes transforms (called characteristic functions) which are 
easier to deal with. 

The following Parts utilize the tools so developed to investigate 
probability problems. These problems are centered about the con- 
cepts of independence and of conditioning introduced in Parts III and 
IV, respectively. The corresponding sections 15 and 24 may be read 
immediately after section 9. 
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Chapter III 


PROBABILITY CONCEPTS 


§ 9. PROBABILITY SPACES AND RANDOM VARIABLES 


9.1 Probability terminology. Probability theory has its own termi- 
nology, born from and directly related and adapted to its intuitive 
background; for the concepts and problems of probability theory are 
born from and evolve with the analysis of random phenomena. As a 
branch of mathematics, however, probability theory partakes of and 
contributes to the whole domain of mathematics and, at present, its 
general set-up is expressible in terms of measure spaces and measurable 
functions. We give below a first table of correspondences between the 
probability and measure theoretic terms. Within parentheses appear 
the abbreviations to be used throughout this book. 


probability space (pr. space) normed measure space 

elementary event point belonging to the space 

event measurable set 

sure event whole space 

impossible event empty set 

probability (pr.) normed measure 

almost sure, almost surely (a.s.) almost everywhere 

random variable (r.v.) finite numerical measurable 
function 

expectation E integral f 


We shall use the pr. theory terms or the measure theory terms accord- 
ing to our convenience. We summarize below in pr. terms the proper- 
ties which are specializations of those established in Part I. 
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I. A pr. space (Q, @, P) consists of the sure event 2, the (nonempty) 
o-field @ of events and the pr. P on @. Unless otherwise stated, the pr. 
space (Q, @, P) is fixed and 4, B, ---, with or without affixes, represent 
events. If so required, the pr. space can always be completed, so that 
every subset of a null event becomes an event—necessarily null. 


1° @ts a ofield: for all A’s, A°, U Aj, (1) 4; are events. 
j=l ° j=l 


It follows that, for every sequence Ay, liminf An, lim sup 4p, and 
lim 4, (if it extsts) are events. 
2° P is defined on @ and, for all A’s, 


P420, PQ) 4;) = >> P4;,, PX =1. 
It follows that 
P§6=0, P4SPB when ACB, PUA) S > PY, 
P(lim inf 4,) S lim inf P4, S lim sup P47, S P(lim sup 4,), 
and, if lim Ay, exists, then P(lim 4,) = lim PAn. 


II. Ar.v. X is a function on Q to R = (—, +) such that the in- 
verse images under X of all Borel sets in R are events; it suffices to re- 
quire the same of all intervals, or of all intervals [a, 4), or of all inter- 
vals (—, 5), etc. 

An elementary r.v. is a function on Q to R of the form X = )) x;Ia, 
where x,’s are finite numbers, 7,’s are disjoint events, and >> 4; = Q; 
if there is only a finite number of distinct x,;’s, then X is a simple r.v. 


1° Every r.o. ts the finite limit of a sequence of simple r.v.’s and the 
finite uniform limit of a sequence of elementary r.v.’s; and conversely. 

Every nonnegative r.v. is the finite limit of a nondecreasing sequence of 
nonnegative simple r.v.’s; and conversely. 

2° The class of all r.v.’s ts closed under the usual operations of analy- 
sis, provided these operations yield finite functions. 

3° Every finite Borel function of a finite number of r.v.’s is a rv. 


A random function is a family of r.v.’s; if the family is finite, it is a 
random vector, and, if the family is denumerable, it is a random sequence, 
that is, a sequence of r.v.’s. 

III. Unless otherwise stated, X, Y, ---, with or without affixes, will 
represent r.v.’s and, as usual, limits will be taken for 2 — o. 
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P 
Xn converges in pr. to X, and we write X, — X, if, for every e > 0, 
P[| X,-— X| 2d 0. 


Xn converges a.s. to X, and we write X, —; X, if X, — X, except 
perhaps on a null event (event of pr. 0) or, equivalently, if for every 
e> 0, 

PUI % — X| ><qd—0. 


Mutual convergence in pr. (Xn — Xm -, 0) and a.s. (X, — Xn— 0) 
are defined by replacing above X, — X by Xn — Xm and X; — X by 
X; — X1 with k, / = 1, and taking limits as m,n — ~., 

1° xX, — X if, and only if, X,»—Xm— 20. Xn —> X if, and 
only if, Xn — Xm —; 0. 

a.s. P 
2° If Xn — X then X,, 4, XxX. If Xn — X, then there is a sub- 


sequence Xn, —~, Xask—> 0, with 
ia 1 
EP|| Xu — x = < ©, 
k=1 2 


The terms “integral” and “expectation” and the notations f and E 
will be considered as equivalent. In the case of r.v.’s, we have 
IV. The expectation of a simple r.v. X = D> x14, is defined by 
k=1 
EX = »> x,PAp. 
k=1 


The expectation of a nonnegative rv. X = 0 is the limit of expecta- 
tions of nonnegative simple r.v.’s X, which converge nondecreasingly 
to X: 

EX = lim EX,, 08S X,7T X. 


The expectation of arv. X = X* — X7 is given by 
EX = EX™ — EX’, 


provided the right-hand side is not of the form +0 —o, and if EX 
exists and is finite, X 1s integrable. 
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1° X is integrable if, and only if, | X| is integrable. 

If X1 and Xz are integrable and a, and ag are finite numbers, then 
a,X1 + aXe 15 integrable and E(a,X, + agXe) = aE X, + aEX2; tf, 
moreover, X1 & Xo, then EX, S EX. 

If | Xy | < X_ and Xe 1s integrable, then X, 15 integrable; in particu- 
lar, every bounded r.v. 1s integrable, and if X degenerates ata (X =a 
a.s.), then EX = a. 


The indefinite expectation px of a r.v. X whose expectation exists is 


defined on the o-field @ of events 4 by ox(4) = EXT. 


2° wx on G is o-finite, o-additive, and P-continuous; if X is integrable, 
then ox is bounded by E| X|, and ox(A) > 0 as PA => 0. 

3° MonoTONE CONVERGENCE THEOREM. Jf 0S X,7X finite or 
not, then EX, | EX; if EX is finite, then the measurable function X is a.s. 


aYr.v. 


DOMINATED CONVERGENCE THEOREM. If Xn =. X and |X,| SY 
integrable, then X 1s integrable, and EX, — EX. 

Fatou-LEBESGUE THEOREM. Jf Y and Z are integrable r.v.’s and 
Y < X, or Xn S Z, then 


E(lim inf X,) S liminf EX, or lim sup EX, S E(lim sup X,). 


If, moreover, lim inf EX, or lim sup EX, 1s finite, then, respectively, 
lim inf X, or lim sup Xp is 4.5. a 7.0. 

EQuivaALence. Two functions on Q are eguivalent if they agree out- 
side a null event. Convergences in pr. and a.s., integrals and integra- 
bility are, in fact, defined for equivalence classes and not for individual 
functions. Therefore, as long as we are concerned with a sequence of 
r.v.’s we can consider every r.v. of the sequence as defined up to an 
equivalence. In particular, we can then extend the notion of a r.v. 
as follows: a r.v. is an a.s. defined, a.s. finite and a.s. measurable func- 
tion. 

Let us observe, once and for all, that when the measurable functions 
under consideration are Jy definition @-measurable whete @ is a sub 
o-field of events, then almost sure relations are Pg-equivalences, that 
is, valid up to null @-measurable sets. 

THE COMPLEX-VALUED CASE. A complex r.v. X is of the form X = 
X'’ + 1X” where X’ and X"” are “ordinary” or “‘real-valued”’ r.v.’s as 
defined at the beginning of this section and where 7? = —1; X takes 
its values in the complex plane of points x’ + ix’’, that is, in the plane 
R X R, and its expectation is the point EX = EX’+iEX”. In other 
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words, a complex r.v. X is a representation of the random vector 
{X’, X"}. Similarly, a complex Borel function g = g’ + ig” is a rep- 
resentation of the Borel vector {g’, g’’}. The definitions and properties 
given below of random vectors, random sequences and, in general, ran- 
dom functions extend at once to the complex case where the compo- 
nents instead of being ordinary r.v.’s are complex-valued r.v.’s or, 
equivalently, two-dimensional random vectors. The relation | EX| S 
E| X | is still true; it suffices to use polar coordinates, setting X =. pe‘, 
EX = re, and observe that 


r =e “'Epe’* = Epcos (a — t) S Ep 


*9.2 Random vectors, sequences, and functions. A random vector 
X = (X, ---, Xn) is a finite family of r.v.’s called components of the 
random vector. Every component .X; induces a sub o-field @(X;) of 
events—inverse image of the Borel field in the range-space Ry of Xz. 
The random vector has for range space the m-dimensional real space 


R” = [J] Ry with points * = (x1, +--+, *n) and it induces a o-field @(X) 
k=1 
= @(X1, Xo, --+, Xn)—inverse image of the Borel field in R”. The 


inverse images of intervals (—~, x) C R” are events 


n 


[X < x] = [xy <M tty Xn < Xn] = OLX < xz] 
and, hence, are intersections of events belonging to the @(X;). Since 
the Borel field @” in R” is the minimal o-field over the class of these 
intervals, the o-field ®(X) is the minimal o-field over these intersections 
or, equivalently, over the union of the @(X;,)—a compound or union 
o-field @(X1, «++ Xn) with component a-fields @(X;,). Thus, the elements 
of ®(X) are events and the random vector X can be defined as a meas- 
urable function on the pr. space to the z-dimensional Borel space (R”, 8”). 


We define EX to be (£.X1, EX, --+, EXn)—a point in the space R”. 


A random sequence X = (X1, Xo, +++) is a sequence of r.v.’s called 
its components; it takes its values in the space R® = [] R, of points 
n=1 


x = (%1, %2, --+), that is, the space of numerical sequences. To every 
point x with an arbitrary but finite number of finite coordinates x,,, °°: 
x, there corresponds the interval (—%, ~) of all points y such that 
Vix < Xb °° * Vin < %kay and the minimal o-field over the class of these 
intervals is the Borel field @” in R®. Exactly as for random vectors, 
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it follows that the inverse image under X of @” is the minimal o-field 
over the class of all finite intersections of events 4, € ®(X,)—the 
compound or union o-field @(X) with component o-fields ®(Xn)—then we 
write ®(X) = B(X1, Xe, +++) and the random sequence can be defined 
as a measurable function on the pr. space to the Borel space (R°,@ ). 
Similarly, the definition of the expectation of the random sequence is 
EX = {EX, EX, ---}—when E.Xy, EX, --+ exist. 

A random function Xr = (X;, ¢ € T) is a family of r.v.’s X; where 
¢ varies over an arbitrary but fixed index set T. Exactly as above, the 
range space of Xr is the real space Rr = IR of points xr = («, 

t 


t € T)—the space of numerical functions; intervals (—%, xr) are de- 
fined for points xr with an arbitrary but finite number of finite coordi- 
nates to be sets of all points yr < x, that is, yy < «,, ¢ © T; the Borel 
field @r is the minimal o-field over the class of these intervals. The ran- 
dom function Xr induces the compound or union o-field @(X7r) with com- 
ponent o-fields ®(X,;)—the minimal o-field over the class of all finite in- 
tersections of events 4; © @(X;) as ¢ varies on T or, equivalently, the 
inverse image under X7 of the Borel field @7; and the random function 
Xr can be defined as a measurable function on the pr. space to the 
Borel space (Rr, ®r). By definition, EX7 = {EX,,¢t € T} is a numer- 
ical function—when the EX; exist. 

A Borel function gr is a function on a Borel space (Rr, ®r) to a Borel 
space (Ry, @r-) such that the inverse image under gr, of the Borel 
field in the range space is contained in the Borel field ®r in the domain 
Rr. Therefore, if Xp is a random function to Rr, then the function of 
function gr-(X7) on the pr. space to the Borel space (Ry, ®r-) induces 
a sub o-field of events—inverse image under Xr of the inverse image 
under gr of the Borel field @7-. Thus, @(g7-(Xr)) C @(Xr); in other 
words, gr(X7) is @(Xr)-measurable and, hence, is a random function. 
We state this conclusion as a theorem. 


A. BorEL FUNCTIONS THEOREM. 4 Borel function of a random func- 
tion 1s a random function which induces a sub o-field of events contained 
in the one induced by the original random function. 


Loosely speaking, a Borel. function of a random function induces a 
“coarser” sub o-field of events and has “‘fewer’’ values. 

9.3 Moments, inequalities, and convergences. Expectations of pow- 
ers of r.v.’s are called moments and play an essential role in the investi- 
gations of pr. theory. They appear in the simple but powerful Markov 
inequality and in the definition of the very useful notion of convergence 
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“in the rth mean,” that we shall introduce in this subsection. They 
appear in the expansions of “‘characteristic functions” that we shall 
examine in the next chapter. They play a basic role in the study of 
sums of “independent”’ r.v.’s to which the next part is devoted. Fur- 
thermore, the powerful “truncation” method—to be used extensively 
in the following parts—expands tremendously the domain of applica- 
bility of the methods of investigation based upon the use of moments. 

EX* (k = 1, 2, -+-) and E| x|" (r > 0) are called, respectively, the 
kth moment and the rth absolute moment of the r.v. X. We may also 
consider Oth moments but, for all r.v.’s, the Oth moments are 1, and 
we shall limit ourselves to kth moments where & 1s a positive integer, 
and to rth absolute moments where ¢ is a positive number, unless other- 
wise stated. 

We establish now a few simple properties of moments. While a kth 
moment may not exist, absolute moments always exist but may be in- 
finite. Since integrability is equivalent to absolute integrability, if the 
kth absolute moment of X is finite, then its kth moment exists and is 
finite; and conversely. More generally, since | X \"" s1i+ | x I" for 
0 <r’ <1, we have 


a. If E| X|" < ©, then E| X|" is finite for r' < r and EX* exists and 
is finite fork Sr. 


In other words, finiteness of a moment of X implies existence and finite- 


ness of all moments of X of lower order. 
Upon applying the elementary inequality 


ja+4|’ Se,al’+e,5|’, r>0, 


where c, = 1 or 2”! according as r S$ 1 or r 2 1, replacing a by X, 5 
by Y and, taking expectations of both sides, we obtain the 


c¢-InEquaLiTy. E| X+ ¥|" Sc,E| X|" + c,E| Y|", where c, = 1 or 
2°! according asr Sl orr21. 


This inequality shows that if the rth absolute moments of X and Y 
exist and are finite, so is the rth absolute moment of X + Y. 

Similarly, excluding the trivial case of vanishing E| X|" or E| Y|* (in 
which case the Hdlder inequality below is trivially true), and replacing 


1 1 
a by X/E*|X|", 6 by Y/E® 


Y|* in the elementary inequality 


Karak 1 1 
| ab| $——-+— r>l1, -+- =], 
r r S 


S 
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we obtain the 


1 
Héuper inequauity. E| XY| S FE 


Lilly. 
r S 


1 
X|"-E| Y|*, where r > 1 and 


From this inequality follows the 


MINKOWSKI INEQUALITY. IJfr 2 1, then 


1 1 1 
Ey) X+ X'|" s E*| X|" + Et xX’ |". 
In fact, upon excluding the trivial case r = 1, and applying the Holder 
inequality with Y = |X + X’|"~* to the right-hand side terms in the 


obvious inequality 
E|X+X')) SE X|-|X+ X'[) + E X’|-| X4 x’ |"), 
we find 
E| X+ X'\' s (EF X |" + Ei Xx’ I) Bs X + X's, 


1 1 

where-~-+- = 1. Upon excluding the trivial case of vanishing 
ros 

E| X + X'|", noticing that (r — 1)s = 7, and dividing both sides by 


1 
Es| X + X'|", the asserted inequality follows. 
Hoélder’s inequality with r = s = 2, is called the 


ScHwarz 1nEQua.ity: E?| XY| S E| X|?-E| Y|?. 
ror rtr 
Replacing X by | X|2 and Y by | X| 2, with?’ S71, and, taking 
logarithms of both sides, we obtain the inequality 


log E| X |" Ss 4log E| X|"—” + 4 log E| X|tt"" 
b. log E| X|" is a convex function of r. 
Holder’s inequality with X, Y, r, s replaced respectively by | X |”, 1”, 


p/r, a/r (hence : = - + -) becomes E!!"| X |" < El?| X |? forr < p. 
q 
Hence, 

c. E!!"| X |" is nondecreasing in r. 

In fact, E/"| X |" T EV/?| X |? asr T p. For, if E| X |? < @ then 
| X |” S max(1, |X|”) and the dominated convergence theorem applies. 
-. X |? = © apply what precedes to Y, = | X |Itx}<n) then let 
nT o, 
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We introduce now convergence in the rth mean. Let X, and X be 
r.v.s with finite rth absolute moments, so that, by the c,-inequality, 
the same is true of X, — X. We say that the sequence X, converges 


to X in the rth mean, and write Xp, a X, if E| Xn — xr — 0, 
Let X, — X. Ifr <1 thenit follows, by the c,-inequality, that 


| E| Xn |’ — Z| X|"| SE] X, — X|" > 0, 
and, if r > 1, then it follows, by the Minkowski inequality, that 


1 1 1 
| E*| X, |" — Ey] X|"| s 


Xn — xX |r — 0. 
This proves that 
d. If X, — X, then E| X, |" > E| X|". 


We conclude this subsection with a simple but basic inequality and 
a few of its applications. 


A. Basic INEQUALITY. Let X be an arbitrary r.v. and let g on ‘A bea 
nonnegative Borel function. 
If g is even and is nondecreasing on (0, +) then, for every a 2 0 


BX) ~ 8 — py x} = gg BX. 
a.s. sup g(X) g(a) 


If g is nondecreasing on R, then the middle term 1s replaced byP |X = al, 
where a is an arbitrary number. 


The proof is immediate. Since g is a Borel function on R, it follows 
that g(X) is a measurable function on 2 and, since g 1s nonnegative on 
R, its integral exists. If g is even and is nondecreasing on [0, +), 
then, setting .7 = [| X| = a], from the obvious relations 


Be(X) = f 6) +f) 20 


and 
g(a)PA sf 9X) Sas. supg(X)-PA, 0Sf w(X) 5 g(a), 
A 
it follows that 
g(a)PA S Eg(X) Sas. sup g(X)-PA + g(a). 


This proves the first assertion and the second 1s similarly proved. 
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Applications. (1) Upon taking g(x) = e(r > 0), we obtain 
Fe™* — era 
—— = P[X 2a] S e"Ee™* 
(2) Upon taking g(x) = | x "(r > 0) we obtain 
E| X|"-— a E| X |" 
————-. = P|| X| 2 4] s ——-; 
a.s. sup | X |" | J 


the right-hand side inequality is called the Markov inequality, and for 
r = 2 it reduces to the celebrated Tchebichev inequality. 

Upon applying Markov’s inequality with X replaced by X, — X, it 
follows that 


If Xn > X, then Xn + X, and if the Xn are a.s. uniformly bounded, 
then, conversely, X,—> X implies that Xn => X, 


(3) Upon taking g(x) = lel (r > 0), we obtain 
1+ |x|" . 
r r | r x |" 
pl*l cr xjea +a XT 
1+/X|" lta a 144xir 


replacing X by X, — X and by Xm — Xn, it follows that, as m,n — », 
|X, — X]’ 
+> 
1+ |X, —- X|’ 
| Xm — Xn" 
—________- — 0. 
1 +- | Xm ~— Xn |" 
REMARK. Observe that the function defined by d(X, Y) = 
|X — Y| 


1+|x-Y| 
distance, except that d(X, Y) = 0 implies only that X = Yas. It 
follows from the foregoing proposition that 


P 
Xn > X tf, and only if, E 


Pp 
Xm — Xn 7 0 if, and only if, E 


has the triangular and identification properties of a 


The space of the equivalence classes of the r.v.’s defined in a pr. space ts 
a complete metric space with distance d defined by 


|X — Y| 
1+|/x-Y|. 


and convergence in distance is equivalent to convergence in pr. 


aAX,Y=E 
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*CONVEX FuNcTIONS. The relations between moments established at 
the beginning of this subsection are essentially convexity properties. 
Let us recall a few classical properties of convex functions. 

Let g be a (numerical) Borel function defined on a finite or an in- 
finite open interval JC R. g is said to be convex if, for every pair of 
points *, x’ of J, 


(; -- “) c 1 1 
s\—> S58) + 53): 
if g is twice differentiable on J, then the convexity property is equiva- 
lent to g’ 2Oon/J. The same definition applies to g on an N-dimen- 
sional interval J% and is equivalent to the convexity of the function 
g(x + ux’) of the numerical argument w for all values of w for which 
x + ux’ € I%, so that it suffices to consider convex functions on J C R. 
A convex function on J is either continuous on J or is not a Borel func- 
tion. Thus, from now on, a convex function will be assumed to be con- 
tinuous on its domain. In that case, g is convex on J if, and only if, 
to every “9 © J there corresponds a number A(xo) such that, for all 


xe, 
A(xo) (* — Xo) S g(x) — g(x). 


Let X be a r.v. whose values lie a.s. in J and whose expectation EX 
exists and is finite. Replacing x9 by EX and x by X, and taking the 
expectation of both sides of the foregoing inequality, it follows that 


e. If gis convex and EX 1s finite, then 
g(EX) S Eg(X). 
If ¢ is strictly monotone, then this relation can be written 


EX S g*(Eg(X)). 


For example, for r 2 1, g(x) = x"(« € (0, +2)) being convex, we have 
1 


E| X| s EV" | x. 

More generally, let G; and G2: be two continuous and strictly increas- 
ing functions such that g = G:G," 1s convex; we say then that G2 zs 
convex in G,. Since Y = G,(X) implies that X = G,—1(Y), it follows 
by e, upon assuming that ZX and FY are finite, that 


GoG," (EY) S EG2G,7"(Y) 
and, hence, 


e’. If Go is convex in Gy, then 


G,~*(EG\(X)) S Go7!(EG2(X)). 
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For example, since on (0, +), x”? is convex in x"! for ro = rj, that is 
5) 5) 5) 2 1» p) 
the function «”/" is convex, we have 


1 1 
En| X|" < Ex| X|? for rm 2 ry. 

*0.4 Spaces LZ, The r.v.’s whose rth absolute moments are finite 
are said to form the space L, over the pr. space (Q, @, P); in symbols, 
XxX € L, if E| x |r <o; we droprif r=1. We shall find later that 
the space Zp is a very important tool in the investigation of pr. prob- 
lems, especially those relative to sums of “independent” r.v.’s. It will 
be convenient to introduce two boundary cases. The first is the trivial 
space Lo of all r.v.’s X since E| X |° = 1 is finite. The second is the space 
L, of all a.s. bounded r.v.’s. Since lim E] X|" < © if, and only if,| X| <1 


re 


a.s., it seems that only the subspace L’, C Ly of r.v.’s a.s. bounded 


1 
by 1 ought to be introduced. However, for r > © it is lim Er| X | 
which counts, and this limit is finite if, and only if, X is a.s. bounded. 
In fact, let s be the a.s. supremum of | X\, defined by P|| xX | >s] =0 
and P{| X| = c] > 0 for every ¢c <5; we have s So. The foregoing 
assertion is implied by 


1 
a. E*| X|° = lim E| X|" = as. sup| X| =s. 
For 
1 1 
XP 2 E(X|\Ixi2g 2PIX| 2d —s 


1 
$2 Er 


asr — 0, thencTs. 
The foregoing definitions permit us to state 9.3a as follows: 


b LL DL0L,0L,0L1',,085 rss. 


Let us observe that the space of all simple r.v.’s is a subspace of L, 
and, hence, of all the spaces Z,. 
Since, by the c,- and Minkowski inequalities and by a, 


E\X+Y\SE|X|\"+24/Y(, 0<r<], 


1 1 1 
E|X+Y\sE|X\+E|Y\|, 1srso, 


and E| xX — Y|" = Q if, and only if, X and Y are equivalent, we have, 
according to the definitions relative to metric and normed spaces, the 
following theorem. 
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A. The spaces L, are linear metric spaces with metric defined by 


d(X,Y)=E|X-—Y|" for 0<r<il 


and norm ; 
|| X|| = £]X|" for 1Srso, 
provided equivalent r.v.’s are identified. 


The problem arises whether the spaces Z, are complete and what are 
the convergence theorems in these spaces. Unless otherwise stated, 
from now on 0 <r <_o (the reader is invited to examine in each case 
the boundary spaces Zp and L,). 

First we observe that on account of A and 9.3d we have 


c. Convergence in distance d(Xn, X) — 0 in L, ts equivalent to con- 


r e e e 
vergence in the rth mean X, — X and implies convergence of distances 


d(Xn, Xo) — d( X, Xo) to any fixed Xo € L,. 
Also, if Xn € L,, then, for a r.v. X, E| Xn — X I", which always exists, 


can converge to 0 only if, from some value of 7 on, E| Xn — x |" is 
finite and, hence, only if X € ZL,, so that 


d. If X, is a sequence in L, and E| X, — X|’ — 0, then XC L,. 
We are now in a position to prove the 

B. L,-cOMPLETENESS THEOREM. Let the X,€CL, Then X, a 
some X if, and only if, Xm — Xn 2 0, a5 myn —> ©. 

Proof. If Xn => X, then Xn — Xn — 0, since, by the c,-inequality, 
E| Xm — Xn |’ S ¢rE| Xm — X|" + ¢-E| X — X, |" 0. 
Conversely, if Xm — Xn > 0, then, by the Markov inequality, for 

every e > 0, 
Pll Xm — Xn | 


IV 


1 
] S—E|X, — Xn |’ > 0 as m,n —> ~, 
€ 


P . a.s. 
so that X,— X, — 0. Therefore, there is a subsequence XX,’ — 


some X as n’ — oo and, for every fixed m, Xm — Xi! =; Xm — X as 
n' — 0, Since E| Xm — Xx |" 0 as m, n' — «, it follows, by the 
Fatou-Lebesgue theorem and the hypothesis, that 


E| Xn — x|" < lim infy! E| Xm — Xn! |" —+ QO as m-— », 


Thus, Xn ~ X, and the proof is complete. 
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If a r.v. X is integrable, then the (indefinite) integral of X is P-ab- 
solutely continuous: {i X|—0as P40. Let B={|| X| = a). 


Since PB — 0 as a4 > &, it follows that [\xl—0 as a —> ©, 


Conversely, this implies that 


fixl=f [xl +f |X| <[|Xx|+ P40 
A AB ABS B 


as PA — 0 then a — o, and thus implies that X is integrable, since, 
given e > 0, 


{[Xlsf[xlte<cta< 
B 
for a = a, sufficiently large. 
The integrals of r.v.’s X, are uniformly P-absolutely continuous or 
simply uniformly continuous if [i X,| 3 O uniformly inn as P4 = 0; 
A 
in other words, for every ¢ > O there exists a 6, independent of ” such 


that { Xn| < for any set 4 with PA < 6. Let B, = (| Xp | = al. 
A 
The r.v.’s | X, | are uniformly integrable, if | X, | — 0 uniformly in 
Bn 


n,asa— ©, Observe that if the fl X,, | are uniformly bounded, say, 


by c(< ©), then, by Markov’s inequality, PB, < c/a ~ 0Oasa — o. 
Upon replacing X by X, and B by B,, in the foregoing discussion, it 
follows that 


e. The r.v.'s Xn are uniformly integrable if, and only if, their integrals 
are uniformly bounded and uniformly continuous. 


Let X, — X hence X,la > XI4. It follows, by 9.3d and the above 
lemma (take 4 = Q, and take 4 such that P4 — 0) 


f. If X, — X, then the | Xn |" are uniformly integrable, 


For use on the forthcoming theorem, note that (Young) 
The Fatou-Lebesgue theorem and the dominated convergence theorem re- 


main valid if therein Y and Z are replaced by Un and V» with U, —> U, 
V “, V and | U, — [ U finite, [% — [ V finite 
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For then, the argument pp. 125-6 remains valid. Furthermore, by se- 
lecting {7’’} p. 126 so that also Uy» —+ U, we have 


If | Xn| < U,, with U, — Uand [ U, —> [ U finite, then X_ *> xX 
implies that | Xn —> [xX in fact || X, - X| 0. 


C. L,-CONVERGENCE THEOREM. Let the X, < L,. Then 
r P 

(i) X, — X if and only if (1) Xn > X 

and one of the following conditions holds: 


(111) f Xn |’ > || X|" < ~; (iv) the | X,|" are uniformly integrable; 


(v) the | An I", or (v1) the | Xn — X I", have uniformly continuous integrals. 


Proof. Let «> 0 be arbitrary, set 4, = [| Xx - X| 24, Inn = 
| Xm — Xn | > e], and let m,n — ©. Weusethec,-inequality without 
further comment. Note that (iv) implies X, € L,. 

Condition (i) implies (ii) by Markov inequality (P4, S$ E| X, —- 
X |"/e" — 0) and implies (iii) by 9.3d. Conversely, (11) and (a1) imply 
(i), since then | Xn — X|" Sep] Xn |" ter|X|" = Up with U, > 


2e,|X|and [Us > 2erf | XI" <=. 


As for the remaining assertions, (i) implies (iv) by f, and. (iv) implies 
(v) by e applied to the | Xn |" in lieu of the X,. Also, clearly (1) implies 
(vi), and (vi) implies (v), since it implies integrability of | Xn — xX |" 


hence of | X |" (because X, € L,) so that fl X,| cof | X, — X|" 
A A 
+ cof X |" < ¢ for PA sufficiently small. 
A 


Thus, to complete the proof, it suffices to show that (11) and (v) imply 
(i). Since convergence in pr. (in the rth mean) is equivalent to mutual 
convergence in pr. (in the rth mean) and Xn — X, Xn — Y imply 
that Y = X a.s., we can replace (i) and (ii) by (i’) E| Xm — Xn |" 7 0 
and (ii') P4mn—0. The assertion follows since, upon integrating 
| Xm — Xn |" on Amn and on Amn‘, (i’) and (v) imply that as m, n > 0 
thene > 0, E| Xn — X,|° S ap | Xm |’ taf, | X,|' + «7 0. 


mn mn 
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Corottary 1. X, > X implies Xn = X for r' <r. 
Set 4, = [| X, — X| = 1] and observe that 


{lx - xl =| |X, — X |" 
A AAn 


+f, J%e— Xl sf 1% - xi + PA 
AA,° A 

Corottary 2. Jf sup E| X, |" =e <0, then X, = XxX implies 
be X forr’ <r. 

Let 4, = (| Xn | = al and observe that 


flea f Ll tf [ral see tard <e 
A AAn AAnt 
by taking a sufficiently large to have ca” —" < =and, then, P4 sufficiently 


small to have "PA < , | 
Corotiary 3. If pe <YCL, for large n, then X, = X im- 
plies X, > XE L,. 
Observe that for large 7, fl Xn ? aba 
A A 


We proved in 9.3 a particular case of this corollary, with Y = ¢ < o, 


We summarize below the relations between various types of con- 
vergence: 


a.s. a.s. ; 1 
k 
ft 


rv 
X, 2X2 XM re X <r. 


The operation of integration on the complete normed linear space ZL, 
with r 2 1 can be characterized as a functional of the integrand, as 
follows: 

1 
D. INTEGRAL REPRESENTATION THEOREM. Let-+—=1 with1 <r 
5 


r 
< &, 
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A functional f on L, is linear and continuous if, and only if, there ex- 
ists arv. Y €L, such that f(X) = EXY for every X € L,; then f de- 


termines Y up to an equivalence and || f|| = E| Y|*. 


1 1 
Proof. Since - 7+: -=1]and1<1r< _o, it follows that l<s < 0, 


and we apply repeatedly Holder’s inequality Z|] XY| < | X ||, || Y |I;5 
1 
where || X ||, = Bi X|" and || Y||, = £5 Y|* with || ¥||. = lim 


pal = a.s.sup| Y|. 

If || X||,|| Y ||, is finite, then {(X) = EXY exists, is finite, and de- 
fines a normed functional f on LZ, with || f|| S || Y||s. Since EXY is 
linear in X € L,, so is f(X). Being normed and linear, f is continuous. 

Conversely, let a functional f on Z, be continuous and linear; linearity 
implies additivity and additivity implies f(@) = 0, where @ is the zero- 
point of Z,, that is, the class of r.v.’s degenerate at 0. Therefore, the 
set function g on @ defined by o(4) = f(L4) is continuous and addi- 
tive, hence o-additive, and vanishes for null events, hence is P-con- 
tinuous. Thus, the Radon-Nikodym theorem applies and ¢ on @ de- 
termines up to an equivalence a r.v. Y such that 


fda) = o(A) = Ely. 


Since f(X) and EXY are both linear in X, it follows that f(X) = EXY 
for all simple finite X(C L,). If YE L, and L, DX, — X hence 


XnY¥ a XY, then, by continuity of f and of E on L,, this equality 
extends to all X € L,. Since f has finite norm || f || S || Y |],, to com- 
plete the proof it suffices to show that the reverse inequality || f || = 
| Y ||, is true. 

Letr > 1. If the X, are simple finite and 0 $ X,T| Y|, then 


1 
n|° S E(X,°! sign Y)Y S || f || 


Ys — |] Xn lle <I. 


Let r= 1. If there exists an ¢ > 0 such that || Y lle = > | || +e 
and we set 4 = [| Y| = || f|| + €], then P4 > 0 while 


(\|f|| + )P4 S El LaY| = EUa sign Y)Y S ||f||P4, 


Xn [(s—Dr 


yields 


IA 


and we reach a contradiction. This completes the proof. 
Remark. The definitions and results of this subsection extend at 
once to complex-valued r.v.’s. 
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$10. PROBABILITY DISTRIBUTIONS 


10.1 Distributions and distribution functions. Let X be a r.v. on 


our pr. space (Q, @, P). The nonnegative set function Px defined on 
the Borel field @ in R by 


PxS = P[XCS), SEB 


is called the pr. distribution or, simply, distribution of X. Since X is 
finite, the inverse image under X of R is Q and, since the inverse image 
of a sum of Borel sets is the sum of their inverse images, we have 


PxR=1, Px()) §;) = >) PxS;, 8; C8. 


Therefore, Px on @ is a probability. Thus, the r.v. X induces on its 
range space a new pr. space (R, ®, Px), to be called a pr. space induced 
by X on its range space or the sample pr. space of X. Moreover, 


a. The distribution Px of X determines the distributions of all r.v.’s 
g(X) where g is a finite Borel function on R; and Eg(X) = [. gdPx in 


the sense that, if etther side of this expression exists, so does the other, and 
then they are equal. 


Proof. Every finite Borel function g(X) of a r.v. X is a r.v. and, 
by definition, 
[e(X) € 5] = [XE g"6)] 


where S and g~4(S) are Borel sets. Therefore 
Pax)(S) = Pxg (8), SEB, 


and the first assertion 1s proved. 

The second assertion will follow if we prove it for nonnegative func- 
tions g. Because of the monotone convergence theorem, it suffices to 
prove it for nonnegative simple functions g and, because of the addi- 
tivity property of integrals, it suffices to prove the assertion for indi- 
cators. Thus, let g = Js, so that g(X) = [ix eg. But, then, the left- 
hand side of the asserted equality becomes 


{ Ix cadP = PIX € S], 
while the right-hand side becomes f Ig dPx = PxS. Therefore, by 
| R 


definition of Px, the asserted equality holds, and the proof is complete. 
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Distributions are se¢ functions and are not easy to handle by means 
of classical analysis developed primarily to deal with point functions. 
Thus, in order to be able to use analytical methods and tools, it is of 
the greatest importance to find, and learn to use, point functions which 
“represent” distributions, that is, which are in a one-to-one correspond- 
ence with distributions. Such functions are obtained by the correspond- 
ence theorem according to which, to the finite measure Px corresponds 
one, and only one, interval function defined by 


Fy[a, 6) = Pxla, 6) = Plas X <b), [4,4 CR. 


In turn, to this interval function corresponds one, and only one, class 
of point functions on R defined up to an additive constant, by 


Fx (0) — Fx(@) = Fxla, 6), a@< dR. 


Recalling that Px is the distribution of a r.v. X, we select among all 
those functions the function F’x defined on R by 


P’x(x) = Px(—~, x) = P[X <4], * CR, 


and call it the distribution function (df.) of X. Then, according to the 
usual notational convention, the equality in a can be written Eg(X) = 


f g aFy and, if g is integrable and continuous on R, then the right-hand 
R 
side L.-S.-integral becomes an improper R.-S.-integral. 


b. The df. Fx of a rv. X is nondecreasing and continuous from the 
left on R, with Fx(—«) = 0 and Fx(+0) = 1. Conversely, every func- 
tion F with the foregoing properties is the af. of a r.v. on some pr. space. 


Proof. ‘The first assertion follows from the fact that PLX < x] does 
not decrease as x increases, approaches P[X < x’] as x] x’, and ap- 
proaches P[X = —o] = 0 or P[X< +o] = 1 according as x — —o 
or x — +0. The converse follows by taking, say, for pr. space (R, 
@, P) where P is the pr. determined, according to the correspondence 
theorem, by F. Then F is the d.f. of the r.v. X defined on this pr. 
space by X(*) = x, x CR. 


Remark. There are pr. spaces on which there can be defined r.v.’s for 
every function F with the stated properties. 


For example, take for the space Q the interval (0, 1), for the o-field of 
events the o-field of all Borel sets in this interval, and for pr. the Le- 
besgue measure on this o-field. Then any function F with the stated 
properties is the d.f. of an inverse function X of F. 
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The weakest type of convergence of sequences of r.v.’s considered 
so far is convergence in pr. In turn, it implies a type of convergence 
of d.f.’s, as follows: 


c. If Xn *, X, then Fx, — Fx on the continuity set C(Fx) of Fx. 
Proof. Since 
[X < x] = [X, < *, X < *’] + [Xn 2 ve, X < x] 
Cc [X, < x] + [X, 2 *, X < x’), 


we have 
PIX < x’] S Fx («) + PIX, = «x, X < x’). 
If X, — X *, O, then, for x’ < x, 
P[X, = «,X <«'] S Pil X, —-X|2=%«-—x'] 30 


and, hence, 
F’y(x’) S lim inf Fx, (*), «’ < x. 


Similarly, interchanging X and X,, « and x’, we obtain 
lim sup Fy (*) S Fxr(x”), « << %", 
Therefore, for «’ << x < x”, 
F’x(x’) S lim inf Fy, («) S lim sup Fx,(*) S Fx(x") 
and, if « € C(F'x), it follows, letting x’ T « and x’’ | x, that 
| P’x(*) = lim Fy, (*). 

The same argument with X’, in lieu of X and x’, x’ € C(F’x) yields 

d. If X, — X’, *, O and Fy’, — Fx on C(Fy), then Px, — Fx on 
C(F x). 

Particular case. There is an important case in which convergence in 
pr. and convergence of d.f.’s are equivalent: 

Xn =. ¢ if, and only if, Fx, — 0 or 1 according asx <corx>c. 
Follows by c and d. 


First Extension. Let X = (Xi, +--+, Xy) be a random vector or, 
equivalently, a finite class of r.v.’s X1, -+-, Xn. The distribution of X 
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is defined on the Borel field @” in the N-dimensional space RX = 


N 
II Rk by 
ae Px(S) = PIXE S], SEB". 


As for a r.v., Px is a pr. and the induced pr. space is (R¥, @%, Px). 
Proposition a, with its proof, continues to be valid: the first part holds 
for every finite Borel function gon R™ to some R™ and the second part 
holds for every component of g. 


The distribution function (d.f.) Fx on R% of X is still defined by 
Px (*) = Px(—%, x) = PIX < x], ~ ER, 
or, more explicitly, by 
P’y,,.--,.Xy(*15 mt, x*n) = PLX, < %1, °° "9 XN < xy]. 

Px determines the increment function of Fy and, conversely, by 

Pxla, 6) = Fxla, 6) = Ap_oF x(a), a2 <b6C€ RN 
or, more explicitly, by 

Pla; S Xi <b, +++, an S Xn < by] 

= Abj-a**’ Aby—ayl’X},-+-,Xy(415 a) an), 

where Aj,—a,, & = 1, «++, N, is the difference operator of step 3, — ay 
operating on ax. 

Proposition b and its proof, as well as the remark, remain valid, 
provided Fx “nondecreasing” means that A,Fxy 2 0 for k > 0, that 
is, Ay > 0, ---, Aw > O, and x — —o or » — +00 means that one at 
least of the x, — — or that all the x, — -++«, respectively. 

; , P 

Proposition c and its proof remain valid, provided X, — X means 

P 
that every one of the components X,, — X;,, k = 1, ---, N. 
*Let X = {X:, ¢© T} be an arbitrary random function or, equiva- 


lently, an arbitrary class of r.v.’s X;,¢€ T. Then X induces the pr. 
space (R", @", Px)—its sample pr. space—where R™ = TJ R; is the 
LET 
range space of X, @” is the Borel field in R’, and Px is the distribution 
of X defined by 
Px(S) = PIXE S], SEB’. 


According to the consistency theorem, Py determines the consistent 
family of the distributions Py, ,...,x,, of all finite subfamilies (X,,, ---, 


Xiy) of the family X and, conversely, a consistent family of distribu- 


172 PROBABILITY CONCEPTS [SEc. 10] 


tions on Borel fields of all finite subspaces R,,...4, of R’ determines a 
distribution on ®?. Similarly, the d.f. Fx on R® is defined by the con- 
sistent family of the d.f.’s F’y,,...,.x,, Of all finite subfamilies of the 


family X and, conversely, a consistent family of d.f.’s on all finite sub- 
spaces of R? defines a d.f. on R?. 

Remark. So far, the numerical functions under consideration were 
r.v.’s, that is, finite (or a.s. finite) measurable functions. However, 
the preceding definitions remain valid for nonfinite measurable func- 
tions, provided the range-spaces are extended, that is, R, Ry, R; = 
(—,-+oo) are replaced by R, Ry, R; = [—%, +]. Thus, say, R% 


is replaced by R*® = I R, and, at the same time, ®” is replaced by 


®@*—the Borel field in ‘RY, and Px on ®% is replaced by Px on 8”. 
To fix the ideas, let X be a numerical measurable function, not neces- 
sarily finite. Since @ is determined by @ and the sets {—o} and {+}, 
Px on @ is determined by Px on @ and the values 


Px(—»%) = P[X = -], Px(+%) = P[X = +]. 
In fact, Px on @ is determined by the d.f. Px of X, defined by 


Fx (*) = P[X < x] = Px[—®%, *), x Ee R, 
since 


Fxy(—«) = lim Fr(x) = P[LX = —o] 20 
and 


Fx(+0) = lim Fx(x) = PIX < +o] = 1—-— P[X = +o] 1 
zr +0 


10.2 The essential feature of pr. theory. We are now in a position 
to describe the essential feature of pr. theory as distinct from measure 
theory. 

While pr. concepts are born from experience and, in their rough form, 
are perhaps older than the measure-theoretic ones, yet their rigorous 
formulation was given in this chapter in terms of and by specializing 
the measure-theoretic concepts. Thus, it looks as if, nowadays, pr. 
theory were a part of measure theory or, conversely, as if measure 
theory were a generalized and rigorous pr. theory. Therefore, it is im- 
portant to point out the basic distinction between these two interlock- 
ing branches of mathematics. The fact is that the distinction does not 
lie in the greater or lesser generality of the concepts, but in the proper- 
ties investigated in these branches of mathematics. 
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Let us start with an analogy. Geometry, say, euclidean plane geom- 
etry, appears to be a part of algebra and analysis, since we can consider 
a point in a plane as an ordered pair (x,y) of reals or as a complex 
number, a straight line as a linear equation in x and y, etc. Yet, geom- 
etry remains a science per se, not because it has its own terminology or 
is older than algebra and analysis, but because geometry studies those 
properties of sets of points that remain invariant under all the trans- 
formations which, say, preserve the distances; for example, euclidean 
displacements in the case of the euclidean geometry. And geometric 
terminology developed, frequently unconsciously, for this specific pur- 
pose is, on the whole, well adapted to the geometrical intuition, prob- 
lems, and methods. 

Now, measure theory investigates families of functions on a measure 
space to other spaces, distinct or not from the first. On the other hand, 
pr. theory has developed and continues to develop the intuition, prob- 
lems, and methods of its own in exploring those properties of families 
of functions which remain invariant under all the transformations which 
preserve their joint distributions—the reason being that the primary 
datum in random phenomena is not the pr. space but the joint distri- 
butions of the families of r.v.’s which describe the characteristics of 
the phenomena. Since the measurable characteristics are finite, pr. 
theory limited itself to r.v.’s (which, by definition, are finite). This 
explains the historical reason for the restrictions imposed on the meas- 
ure-theoretic setup of pr. theory. However, today pr. theory is suffi- 
ciently mature mathematically to show signs of getting rid of those 
restrictions, by considering more general families of functions on meas- 
ure spaces (normed or not) to more and more abstract spaces. We can. 
summarize the essential feature of pr. theory as follows: 


A PROPERTY IS PR.-THEORETICAL IF, AND ONLY IF, IT IS DESCRIBABLE 
IN TERMS OF A DISTRIBUTION. 


In other words, 


A property of a family of functions on a measure space ts pr.-theoretical 
if, and only if, the property remains the same when the family ts replaced 
by any other family with the same distribution. 


In particular, since in the numerical case a distribution is represented 
by the corresponding d.f.’s, we can say that 


—the pr.-theoretic properties of ar.v. X are those which can be expressed 
in terms of its af. Fx, 
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—the pr.-theoretic properties of a finite family (X1, Xe, +--+, Xn) of 
r.vu.s are those which can be expressed in terms of the joint af. 
Px X9,-+ Xn 

—the pr.-theoretic properties of any family (Xi, t © T) of r.v.’s are 
those which can be expressed in terms of the joint af.’s of its finite 
subfamilies. 


More generally, consider a function X on a pr. space (Q, @, P) to 
some abstract space 2’. The class of all sets in Q’ whose inverse images 
under X are events is a o-field @’ in 0’; assign to 4’ € @’ the number 
P'A' = P(X~'d4’'). This defines the induced pr. space (0', @’, P’). 
The pr.-theoretic properties of X are those which can be expressed in terms 
of P’ on @’. If we limit ourselves to these properties only, we can speak 
of a “stochastic variable” X described by a “pr. law” represented by P’. 
Those are the mathematical beings we are concerned with, and the 
function X, the measure P’ (or the d.f.’s in the preceding cases) are 
only various ways of talking about those beings in various languages. 
It is important to realize fully that measurements of a stochastic varia- 
ble are relative to the induced pr. space; the original pr. space is but a 
mathematical fiction. Yet it is basic, for it permits the use of a “com- 
mon frame of reference” for the families of stochastic variables we in- 
vestigate—the families of sub o-fields of events they induce on the 
original pr. space. However, precisely because of the existence of a 
common frame of reference in the present setup, modern physics forces 
us to introduce a different setup that we shall see in the next volume. 


COMPLEMENTS AND DETAILS 


Notation. Unless otherwise stated, the pr. space (2, @, P) is fixed, the 
spaces L,, L;(r, s > 0) are defined over the pr. space, and, with or without 
affixes, 4, B, --- denote events, while X, Y, --- denote r.v.’s. 

7. Rewrite in pr. terms as many as possible of the complements and details 
of Part I. 

2. The convex function log E| X |" of r is linear if, and only if, X is a degen- 
erate r.v. 

3. Liapounov’s inequality. Let pw, = E| xX I". If r2s2¢t20, then 
ur ‘ws "we © 2 1. When does this inequality become an equality? Prove 
Hélder’s inequality by means of properties of convex functions. When does 
this inequality become an equality? 


4. Investigate the possible behaviors of E7| X |" as r varies from —o to 0. 


atb 
2 


5. Apply Markov’s inequality to X — to obtain a bound for 


Pla = X S 4}. Also use the method of proof of the basic inequalities to obtain 
various bounds for this pr. 


6. If go on [0, +) is a nonnegative Borel function such that go(x) = go(e) 
for « 2 ¢, then P[| X|= e] S Ego(| X|)/go(e). Construct a function g on 
[0, +) with g(0) = 0, g(€) = go(€), which is nondecreasing, continuous where 
go is continuous, and such that Eg(| X |) S Ego(| X |). Then the above bound 
is at least as sharp with g instead of go. 


(Form gi(x) = inf g(x’) for x’ 26 and g(x) = min (g,(x), = £0(*)).) 


7. Let g with g(0) = 0 bea continuous and nondecreasing function on [0, +). 
If there exists anh = A(Eg(| X |), €) such that P[| X| 2 e] S A S Eg(| X])/e(e 
for all r.v.’s X, then 4 = Eg(| X |)/g(€) for those e€ > 0 for which the bound is 
of interest, that is, for which Eg(| X |) < g(e). Loosely speaking, the bound 
Eg(| X |)/g(© is the sharpest of all bounds which depend upon Eg(| X |) and e. 

(Take | X | = € or 0 with pr. p and g = 1 — p (pq ¥ 0), respectively.) 

8. For € > 0 sufficiently small, the bound E| X |"/e” is at least as sharp as 
the bound E| X |*/e* with s > r. 

9. Let 

d)(X, Y) = inf { Pl X —Y|2 «] +e} for alle > 0; 

d\(X, Y) = inf e such that P[| X — Y| 2 €] <; 

dX, Y) = Eg(| ‘X — Y |), gon [0, +©@) is bounded continuous and increasing 
with g(0) = O and g(x + x’) S g(x) + g(x’); for instance, take g(x) = 
with ¢ > 0, g(x) = 1 — e7*, or g(x) = tanh». 

Each of the three functions do, a1, dp is a metric on the space of all r.v.’s, 
provided equivalent r.v.’s are identified. Convergence in pr. is equivalent to 
convergence in any of the corresponding metric spaces. 

10. (a) >> | Xn | < © as. if, and only if, the sequence of d.f.’s of consecutive 
sums converges to the d.f. of a r.v. 


(b) If E >>| Xn |" < ©, then >> | X,/" << as. 

(c) Let s =1 or - according asr<lorr21. If >> E*| X,|" < ©, then 
> | Xn | < as. 

ll. Xn — X if, and only if, given e > 0 and 6 > 0, there exists v(e, 6) such 
that P[| X, — X| 2 €] < 6 for x 2 n(e, 4). 

(a) Xn =") xX if, and only if, given e€ > O and 6 > 0, there exists n(e, 6) 
such that P[| X, — X | 2 € for some » 2 n(e, 4)] < 6. 

(b) Xn + xX except on a null event if, and only if, given e€ > 0 there 
exists (e) such that P[| X, — X|2e]=0 for n= n(e) or, equivalently, 
P{| Xn — X| 2 ¢€ for some 2 2 n(e)] = 0. 

12. PiIXn bb X] = lim lim PY [| X. — X| = el]. 


e-—-0n — 0 


cx 
1 + cx 


(a) If >> Pl| X. — X| 2 €] < © for every € > 0, then X, —“, Xx. 

(b) If > E| X, — X |" < @ for some r > 0, then X, —, xX. 

13. Xn = x if, and only if, there exists a sequence én — 0 such that 
PU {| %—X|2 4] 7 0. (For the “only if” assertion select mT by 

kon 


7 1 
PU [| xe — X|2—] <5q and take €, = — for tm Sn < mMm4i.) Let 
k>Mm m 2™ m 


D be the set where the sequence X, does not converge to a finite function. 
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PD = lim lm lm PU [| xX, -X|2 ¢ 


e—Om-ao n- 0 


PD = lim lim’ lim PU [| X, —Xm| 2 4] 
where lim’ denotes lim inf or lim sup indifferently. Can lim’ be replaced by 
lim? 

14, (a) If 3S PlXna1 — Xn| 2 en] < © and >) €, < «, then the sequence 
Xn converges a.s. tO a r.v. 


(b) If S° sup Pl| Xn4p — Xn| 2 €] < © for every € > 0, or 
p 
sup P[| Xnzp — Xn|2e] 70 and DS lim inf Pl| X,4, -—X,|2d< © 
D D 


for every € > 0, then the sequence X, converges a.s. toar.v. (In the last two 


cases, X, —> some r.v. X and P[| X, — X| = 2e] is bounded by the corre- 
sponding term of each of the two series.) 


, 1 ] , ; , 
15. Take X,; = n° or O with pr. 7 and 1 — 7 respectively, and investigate 


convergences of the sequences X, and E| X,, |" according to the choice of ¢ and 
of r. 


16. If Fx, —- Fx on C(Fx) and Y,; — c, then Fx,4v, 2 Fx+e on 
C\Fx+c) (Slutsky). 
What about XnYn, Xn/Yn and in general g(Xn, Yn) where g is continuous? 
(Use 10.1d.) 


1 ] , , 
17. Take Xon_1 = = Aon = = and investigate the sequences X, and Fyx,. 
Take X, = 0 or 1, each with pr. 3, and X = 1 or 0, each with pr. 3. Then 
|X, — X| = 1 but Fy, = F. To what converse is it a counterexample? 


18. If the sequence X, converges a.s. to a nonfinite function, what can be 
said about the sequence F'’x,? 

19. Let {Fn} be a denumerable family of d.f.’s with F,(—%) = 0 and 
F,(+0) = 1. The family of all functions Fy,,...n, = Fn, X++*X Fr, 18 a con- 
sistent family of d.f.’s. Construct as many pr. spaces as you can, on which are 
defined r.v.’s X, such that F’x,,,....Xn, = Fn+-+nm for all finite index sets. 

Extend what precedes to a family {F,} where ¢ ranges over an arbitrarily 
given set T. 

20. There is no universal pr. space for all possible r.v.’s on all possible pr. 
spaces. 

21. Extend as much as possible of this chapter and of the foregoing comple- 
ments and details to complex-valued r.v.’s and to complex vectors, by suitably 
interpreting the symbols used. 


Chapter Ik 


DISTRIBUTION FUNCTIONS AND 
CHARACTERISTIC FUNCTIONS 


§ 11. DISTRIBUTION FUNCTIONS 


11.1 Decomposition. In pr. theory, a distribution function (d.f.), to 
be denoted by /, with or without affixes, is a nondecreasing function, 
continuous from the left and bounded by 0 and 1 on R. This defini- 
tion entails at once that the quantities, 


F(—«) = lim F(x) = inf F, F(+o) = lim F(x) = sup F, 
zs—> —0 zr + 
F(«) = F(« — 0) = lim F(*,) = sup F(*’), 
In Tx a’<ax 
F(x +0) = lim F(x,) = inf F(x’), 
In| x x’ >2 


exist and are bounded by 0 and 1, and « is a continuity or a discontinu- 
ity point of F according as F(x + 0) — F(x —0) =Oor >0. As we 
have seen, a d.f. is always the d.f. of a measurable function on a pr. 
space, and if F(—«) = 0, F(+) = 1, then it is the d.f. of a r.v. 

The requirement of continuity from the left is of no importance, 
since every nondecreasing function fy on R bounded by O and 1 de- 
termines a d.f. F by setting P(x) = Fy(*) or P(*) = Fy (* — 0) accord- 
ing as x is a continuity or a discontinuity point of F;. In fact, even less 
is necessary to determine a d.f. 

Let D denote a set dense in R (for example, the set of all rationals) 
and let Fp denote a nondecreasing function on D bounded by 0 and 1. 
We can assume, without loss of generality, that it 1s continuous from 
the left on D. Since, for every x € R, there exists a sequence {x,} C D 
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such that «, T ¥, %n < *, it follows easily that, according to the defini- 
tion of d.f.’s, 


a. The function F defined on R by 
F(x) = lim Fo(%n), %n€ D, xn <*% 
ints 
isadf. 


It follows that, if two d.f.’s coincide on a set dense in R, they coincide 
everywhere. Furthermore, monotoneity of d.f.’s leads to the 


A. DECOMPOSITION THEOREM. Lvery d.f. F has a countable set of dts- 
continuity points and determines two af.’s F, and Fa such that F, is con- 
tinuous, Fg is a step-function, and F = F, + Fz. 


Proof. If F has at least 2 discontinuity points x, 
AS Ky < Katt) <n <O 
in a finite interval [a, 4), then, from 
F(a) S F(x;) < Fe. + 0) S-++S F(en) < Fn + 0) S FOO), 
it follows, setting p(x,) = F(x, +0) — F(«;z), that 


1) 


Dale) = ¥ {Fen +0) — F(xx)} < F() — F(a). 
=] 


Therefore, the number of discontinuity points x in [a, 4) with jumps 


1 
p(x) > € > 0 is bounded by — {F(4) — F(a)}. Thus, for every integer 
€ 


, 1, 
m, the number of discontinuity points with jumps greater than — 1s 
m 


finite and, hence, there is no more than a countable set of discontinuity 
points in every finite interval [a, 4). Since R is a denumerable sum of 
such intervals, the same is true of the set of all discontinuity points, 
and the first assertion is proved. Furthermore, denoting the discon- 
tinuity set by {x,}, we have, for every interval [a, 4), finite or not, 
Pn) S FY) — FC). 
asan<b 


Upon defining Fy by 
Fax) = 2 pn), * CR, 
In<z 


and setting F, = F — F4, it follows at once that Fg and F;, are d.f.’s. 
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But, for x < x’, 
P(x") — F(x) = F(x’) — F(x) — 2 DP(%n) 


= Fl) — Fe +0) - E plon) 
so that, letting x’ | x, we obtain 
P(x + 0) — F(x) = 0; 
thus /, is also continuous from the right and hence continuous. 
Finally, if there are two such decompositions of F, 
F=F,+Fi=F+ Fa 


then /, — F’, = F’g — Fy, and both sides must vanish since the left- 
hand side is continuous while the right-hand side is discontinuous, ex- 
cept when it vanishes identically. This completes the proof. 


REMARK. Since the discontinuity set of a d.f. is countable, its con- 
tinuity set is always dense in R. However, the discontinuity set can 
also be dense in R. For example, let {rn} be the set of all rationals in 


R (it is dense in R); if D(7n) = 2 >. then the function F defined by 
Tv 


F(x) = LX Pn)» xe R, 


is a d.f. and, in fact, is the d.f. of ar.v., since F(—%) = Oand F(+) = 


FURTHER DECOMPOSITION. F, determines, by u,(—%, x) = F,(x) — 
F.(—%), a finite measure pw, on the Borel field @ in R. Upon applying 
to ue the Lebesgue decomposition theorem with respect to the Lebesgue 
measure on ® we obtain 


be = Mac + sy Hac(S) = | x) dx, SEB, 
Ss 
where g = 0 is a Borel function and ps = 0 on the complement of some 


Lebesgue-null set NV;. It follows that there are d.f.’s F,. and F, which 
correspond to the measures pa, and ys, respectively, such that 


ne ee ae ee ; LOY een 


and fF’; is a continuous d.f. whose points of increase all lie in N,. Thus 
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A’. Every df. F determines three d.f.s of which F is the sum: 
—the step part Fa which ts a step function, 
—the absolutely continuous part Fa, such that 


Fae) =[ s) dx, £20, OR, 


—the singular part F, which is a continuous function with points of 
increase all belonging to a Lebesgue-null set. 


11.2 Convergence of d.f.’s. As 10.1c and 11.la suggest, convergence 
of d.f.’s to a d.f. F ought to be defined without taking into account 
what happens on the discontinuity set of F. 

We say that a sequence F’, of d.f.’s converges weak/y to a d.f. F and 
write F, —> F,if F, — Fon the continuity set C(F) of F. This defi- 
nition is justified—that is, the weak limit, if it exists, is unique, since 
F, — Fand F, — F' imply F = F’ on the set C(F’) NM C(F’) and, on 
the remaining set, which, by 11.1A, is countable, # = F’ by continuity 
from the left. 

We say that a sequence fF, of d.f.’s converges completely and write 
FP, = F, if Fn —> Fand F,(=-=©) — F(=Fo). Weak convergence does 
not imply complete convergence. For exaniple, given a d.f. Po with at 
least one point of increase so that Fo(—~) ¥ Fo(+), let F(x) = 
Fo(x +). Then F,, — Fo(-++) and the weak convergence holds but 
not the complete convergence. However, in the case of weak conver- 
gence we have 


a. Let F, > F. Then 
lim sup F,(—#) S$ F(—~») S$ F(+) S lim inf F,(+), 
Var F S lim inf Var F, 
and F,, -. F if, and only if, Var F, — Var F or Var F, — F,,[—4, +4) 


— 0 uniformly inn asa — ©, 


For, from 


F,(—%) S Fa) S Fr(+), 
it follows that, for x € C(P), 
lim sup F,(—) S$ F(x) S lim inf F,(+°) 


and, letting « — Fo along C(F), the first inequalities are proved. 


Thus 
Var F = F(+0) — F(—) S lim inf (Fn(+) — Fr(—)) 

= lim inf Var F,, 
and the second assertion follows from the same inequalities. 


We still have to find a way to recognize whether a given sequence 
F,, of d.f.’s converges, weakly or completely. 


b. sequence F,, of a.f.’s converges weakly if, and only if, it converges 
on a set D dense in R. 


Proof. The “only if” assertion follows from the fact that the con- 
tinuity set of a d.f. is dense in R. As for the “if’’ assertion, let Pp = 
lim F, on D. The relation of 11.la determines a d.f. F on R. Since, 
for x <x <x", 

PA’) S Fale) S Fre"), 


it follows that, for x’, «”” € D 
F p(x’) S lim inf Fn(*) S lim sup F,(*) S Fp(*”). 
Taking « € C(F) and letting x’ T x and «’’ | « along D, we obtain 
F(x) = lim F,(*), « € C(F), 
and the “if” assertion is proved. 
We are now in a position to prove the basic Helly 


A. WEAK COMPACTNESS THEOREM. Every sequence of d.f.’s is weakly 
compact. 


We recall that (at least here) a set is compact in the sense of a type of 
convergence if every infinite sequence in the set contains a subsequence 
which converges in the same sense. 

Proof. It suffices to show that, if /, is a sequence of d.f.’s, then there 
is a subsequence which converges weakly. According to b, it suffices 
to prove that there is a subsequence which converges on a set D dense 
in R. 

Let D = {x,} be an arbitrary countable set dense in R, say, the set 
of all rationals. All terms of the numerical sequence F,(*;) lie between 
O and 1 and, therefore, by the Bolzano-Weierstrass compactness lemma, 
this sequence contains a convergent subsequence F,;(%,). Similarly, 
the numerical sequence F',1(%2) contains a convergent subsequence 
Fno(*2) and the sequence Fne(*1) converges, and so on. It follows 
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that the “diagonal” sequence Fn, of d.f.’s, contained in all the subse- 
quences {Fini}, {Fno}, ---, converges on D, and the proof is complete. 


B. CoMPLETE COMPACTNESS CRITERION. J seguence Fy of a.f.’s is 
completely compact if, and only if, it is equicontinuous at infinity: Var F,— 
F,[—@, +a) — 0 uniformly in nasa > +2, 


Proof. The “if” assertion is immediate. As for the “‘only if” asser- 
tion, 1f the /, are not equicontinuous at infinity, then, by a and A, there 
exists a subsequence F,, which converges weakly but not completely. 
Note that our “complete” convergence is frequently called “‘weak”’ and 
our ‘‘weak”’ is sometimes replaced by “‘vague.”’ 


11.3 Convergence of sequences of integrals. Let g denote a func- 
tion continuous on R and let F, with or without affixes, denote a d.f. 
We intend to investigate conditions under which weak or complete con- 
vergence of a sequence F,, implies convergence of the corresponding 


sequence of integrals fs dF,, when these integrals exist. Let us ob- 


serve that these integrals do not change if arbitrary constants are added 
to the d.f.’s. The investigation 1s centered upon the basic 


a. Hevtty-Bray Lemma. Jf Fy — F up to additive constants, then, 
for every pair a <b such that F,(a) — F(a) and F,(6) — F(d), 


b b 
[ car. [ gar. 


km 
Proof. Setting gm = 2 £(Xmk)L time tmikys)s Where 


A= Xm1 < %m2 <0 +S Xmkn tl = b 
and An = sup (*m,n41 — %mk) —> 0 as m — ©, we have, according to 
k 


the definition of R.-S. integrals, 


b b b b 
[ smaFn > f cdPy J maP— ff eaF, m— o, 


Upon selecting all subdivision points xm, to be continuity points of F, 


it follows from F, — F that, for every m and every k, as n — © 


Pal¥mky Xmk-+1) —> FlxXmks Xml)» 
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and, hence, 


b km km 
i) Em dF’, = >» £ (mk) Pal mks Xmike-t1) _- > &(Xmk) PX mks Xm k-+1) 
a k=1 k=1 


b 
=| Lm ak. 
Since 


b b 
f g dF, -{ gdaF 


b b b b 
-{ (¢ — gn) Pn + {gm dPn -f gar +f (8m — g) dP 


and the first and last integrals on the right-hand side are bounded by 
sup | g(x) — gm(*) | > 0 as m — ~%, the assertion follows by letting 
asxxz<b 


n — oandthen m — o. 


The extensions of this lemma will be based upon the obvious inequality 


b 
© [fear —fear|slfear—f carl 
b b b 
+|focar-f caml+lf car —fearl 


with @ and d continuity points of F, provided the integrals exist and 
are finite. 


A. ExTenpDED HEtiy-Bray Lemma. Jf g(o) = 0, then Fy > F up 
to additive constants, implies f gdF,— f g dF. 


Proof. Since g is continuous and its limits as x — Fo exist and 
are finite, g is bounded on R and the integrals f gdaF,, and | gdF ex- 


ist and are finite. Letting ~ — © and then a ~ —o, b > +0, it 

follows that, out of the three right-hand side terms in (I), the second 

converges to 0 by the Helly-Bray lemma, whereas the first and the 

third ones are bounded by sup g(x) | — 0. The assertion is proved. 
x a, 
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B. Heity-Bray THEOREM. If g is bounded on R, then F,, > F up 
to additive constants impltes f gdF, — | gdF. 


Proof. Since| g| Sc < ~, the integrals exist and are finite. Letting 
n — cand then a — —o, d > +0, it follows that, out of the three 
terms on the right-hand side of (I), the second converges to 0 by the 
Helly-Bray lemma, whereas the first and the third ones are bounded, 
respectively, by 


c{Var F, — F,[a, 5)} - O and c{Var F — Fla, 4)} — 0; 


and the assertion follows. 

Remark. All the results of these subsections extend, without further 
ado, to d.f.’s F on R® and continuous functions g on RY, with the usual 
conventions for the symbols used above. 


*11.4 Further extension and convergence of moments. Let gon R be 
continuous and Fon R, with or without affixes, be a d.f. The integrals 
we are interested in, are finite Lebesgue-Stieltjes integrals of the form 


f g aF, that is, such that f | g| dF < ; they are, therefore, absolutely 


convergent improper Riemann-Stieltjes integrals. 
We say that | ¢| is uniformly integrable in Fn if, asa —- —x,b => 


b 
00, { l\g| dF, > fic! dF, < © uniformly in 7; in other words, 


b 
flslem—-flelar < 


b 
for a S a,and 6 = b, independent of ”. Since f | g| dF’, does not de- 


given e > QO, 


crease as a4 | —o and/or b7 +, it suffices to require the foregoing 
conditions for some set of values of | a| and 4 going to infinity; for ex- 


ample, that f |g| dF, — 0 uniformly in 2 as Cm — © with 


{x |2em 


m—> ©, 

We consider now properties of the foregoing integrals which follow 
from the weak convergence of d.f.’s Fy; they contain the extensions of 
the Helly-Bray lemma of the preceding subsection (we leave the verifi- 
cation to the reader). 
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Ww e e 
A. CONVERGENCE THEOREM. Jf Ff, — F up to additive constants, 
then 


(i) lim inf [| g| dF = [| ¢| a 
(11) | g| is uniformly integrable in F, = | gdF, > | gdF 


(111) ff] g|dFn =f g|dF < © | g| is uniformly integrable in Fy. 


Proof. Let -+ke be continuity points of F, and use repeatedly the 
Helly-Bray lemma. 
(i) follows, by letting 7 — © and then ¢c — +0, from 


+e +e 


(ii) is proved as follows: 


Given ¢« > 0, let [ |g| dF, < for ¢ 2c, whatever be ~. By 
|2|Z2c 


the Helly-Bray lemma, if ¢’ > cand -kc’ (like -kc) are continuity points of 
FP, then f | ¢| dF < eand, letting c’ > », we have [ | ¢| aF 
eS|xz\|<c’ [x|2ec 


<e« and hence f |g|dF <o. Furthermore, by taking c 2c, and 


letting 2 — © and then e — QO, 


+c 
[sar-fearlsf lela tif ea 


+e 
-{ caFi +f lg| dF — 0. 
zl|2c 


(iii) => follows from ~~ 


[..lelers i flslar-flelar 
; te +c 


by taking ¢ = cg such that the second right-hand side term 1s less than 
e/3, then 2 = mo such that the first and the third right-hand side terms 
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are less than ¢/3, and finally c, = max (¢o, ¢1) ***5 Cnp—1) Where cy 
(k = 1, +++ m — 1) are such that f | g| dF, < e; thus, 
[2 |2cr 


J. le | dF, << 


for ¢ 2 c, whatever be x. 
(iii) = follows by (ii) where g is replaced by | g]. 
This proves the last assertion and terminates the proof. 


Application. Let 
m*) = { * dF), k=0,1,2,---, p®™ = {| x |" dF(x), r20 


define, respectively, the kth moment (if it exists) and the rth absolute 
moment of the d.f. F or, equivalently, of the finite part of a measurable 
function X with d.f. F; if X is a r.v., then this definition coincides with 
that given in 9.3. If / possesses subscripts, we affix the same subscripts 
to its moments. 


B. MoMENT CONVERGENCE THEOREM. If, for a given ro > 0, | x \"° 
is untformly integrable in F,, then the sequence Fy 1s completely compact 


Cc 
and, for every subsequence F, — F and allk,r S 1, 
My ™ —» m® finite, py > pp Sjinite. 
Proof. According to the weak compactness theorem, there is a sub- 


sequence Fy, and a d.f. F such that F,, — F. On the other hand, the 
uniformity condition for | x |? implies that, for every r S71, 


[lela seep [ale dFw(s) 90 asc te 


uniformly in 7’, so that the uniformity condition holds for | x |". There- 

fore, the preceding convergence theorem applies to every sequence 

my and un” with kj r S79. In particular, taking r = 0, we obtain 
c ° 

Var F,, — Var FP, so that F,, — F. The theorem is proved. 


Corotuary. If the sequence pn*” is bounded for some 5 > 0, then 
the conclusion of the foregoing theorem holds. 


For pnt? < a < © implies that, asc — +0, 


i | | x \"° aF’,(“) S yf | | x rote dF (x) S ¢ a — 0, 
z|Z2c x|2ec 


so that the uniformity condition holds for | x |”°. 
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This corollary yields at once the following solution of the celebrated 
“moment convergence problem” (Fréchet and Shohat). 


C. If, fork = ko arbitrary but fixed, the sequences m,™ — m™ finite, 
then these sequences converge for every value of k, and their limits m™ are 
finite and are the moments of a af. F such that there exists a subsequence 

Cc 
Fy —_ F. 


If, moreover, these limits determine F up to an additive constant, then 


c . e 
Fn — F up to an additive constant. 


It suffices to apply the foregoing corollary and to observe that, if the 
m*) determine F up to an additive constant, then all completely con- 
vergent subsequences F, have the same limit d.f. F up to additive 
constants. 


*11.5. Discussion. A d.f. F determined up to additive constants 
corresponds biunivoquely to an interval function F determined by 
Fla, 6) = F(6) — F(a) which in turn corresponds biunivoquely to a 
measure / on the Borel field in R (4.4a)—a subprobability (subpr.) 
since F(R) S 1. 


Weak convergence of d.f.’s F, to F—all determined up to additive con- 
stants, is equivalent to convergence of interval functions defined by 
Fa, 6) — Fla, 6) for every F-continuity interval (a,b), thatis with F{a} + 


F{d} = 0, and we can still write F,—F. The above appearance of 
subpr.’s permits to extend propositions in 11.3 and 11.4 to noncontinuous 
functions g. Since these propositions derive from Helly-Bray lemma 
11.3a, it will suffice to generalize it and the others will follow as before. 
Denote by D, the set of discontinuities of a function g on R to R; it is a 
Borel set (see $12). If F(D,) = 0 we say that g is F—a.e. continuous. 


b 
a, GENERALIZED Hetty—Bray Lemma. If F,,—> F then | gdF,— 


b 
| g dF, for every F-continuity interval [a, 6) and every F-a.e. continuous 
a 


function g bounded on every bounded interval. 


Proof. The method of proof of the Helly-Bray lemma in 11.3 applies 
but for one necessary change due to the fact that our integrals are now 
Lebesgue-Stieltjes ones so that instead of Riemann sums we use Darboux 
sums: Instead of gm we need gm and %m defined by 

km 


km 
§m = > SmiL mis £m = > Limit mky 


k=1 k=1 


188 DISTRIBUTION AND CHARACTERISTIC FUNCTIONS [Sec. 11] 
where Jz, are indicators of F-continuity intervals Jmz = [%muy %m,e+1) 


km 
of length | Jmz| with 5° Jmz = [a, 4), sup| Jmz | 20 as m— &, and 


k=1 
where 

&me = inflg(x):* © Ime}, Bme = sup{g(x):~ © Ime}. 
Since as 7 — ©, by hypothesis, F’,(Jmz) — F(Jmx) so that 


b b b b 
VP gudFi | gu dF [mati fan aF, 


while F(D,) = O implies that F-a.e., as m— ©, 


&m 1 £1 Ems 


letting 7 — © then m— © in 


b b b 
[eats | cats | in dPe 


b b 
[caro] gar 


So far we considered only numerical functions g. But all proposi- 
tions in 11.3 and 11.4 as well as the one above remain valid for complex 


valued g = Rg + i3¢ by, say, fe dF = foo dF + i | (0) dF, In 


fact, then, the inverses of the Helly—Bray lemma and of the Helly—Bray 
theorem are valid because of the weak and complete convergence criteria 
in 13.2. We shall leave these immediate extensions to the reader. 


it follows that 


The lemma is proved. 


Several questions arise at once: Since Borel fields are generated by the 
class of open (of closed) sets, are subpr.’s determined by their values on 
such a class? Is weak convergence determined by the behaviour of 
subpr.’s on open (on closed) sets? Since weak and complete convergence 
are determined by convergence of integrals of some families of functions 
are there other such families? 


It will be convenient to discuss these questions for subpr.’s on Borel 
fields of metric spaces. First, because this generality is needed for 
“functional limit theorems” (see Chapter XII) and second, because the 
proofs are not more involved than for the real line. However, this 
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generality creates two difficulties: First, we do not have intervals in 
general metric spaces hence no interval functions and are reduced to 
work directly with subpr.’s. Second, nontrivial continuous functions g 
vanishing at infinity (that is, such that given e > 0 there 1s 4 compact K 
with |g| < «on K*) may not exist. In fact, on the separable Banach 
space C[a, 4] of continuous functions on [a, 4] to R with the supremum 
norm, the only continuous function vanishing at infinity is the zero func- 
tion. Or this space is central to Ch. XII. Thus the extended Helly—Bray 
lemma is useless. However, the Helly-Bray theorem, with integrals of 
bounded continuous functions, with respect to subpr.’s un, u, remains 
meaningful. But, in the case of the real line, it corresponds to complete 


convergence Un > p or, equivalently, weak convergence of pr.’s bn/pn(R) 
to a pr. p/u(R) (excluding the trivial case of u(R) = 0). Thus, in the 
general case we are led to consider only weak convergence of pr.’s fo a pr. 
and the corresponding “‘relative compactness’’: As 1s easily seen, 11.2b 
implies that a sequence of pr.’s F, on R contains a subsequence which 
converges weakly ¢o a pr. if and only if for every e > O there is a compact 
K,.in Rwith Fr(Kd < ¢forall. Is there a similar criterion for metric 
spaces? Answers to the foregoing questions are to be found in the next 
section. 


*§$12. CONVERGENCE OF PROBABILITIES ON METRIC SPACES 


Throughout this section and unless otherwise stated, with or without 
affixes 


1. X is a space with metric d and Borel field $ generated by the class 
of its open (of its closed) sets, U, C, K are its open, closed, compact sets, 
respectively, andd4 = 4 — A° isthe boundary ofaset Zin X. Proper- 
ties of metric spaces in 5.3 are to be used without further comment. 

2. Pisapr. onS and 4 in &X is a P-continuity set when P(0A4) = 0, 
g, A are Borel functions on the Borel space (X, 8) to the Borel line or 
Borel space (9C’, $8’), respectively. D, is the discontinuity set of g and g 
is P-a.e. continuous when P(D,) = 0; similarly for 4. If g = J4 then 
clearly D, = 04. Note that for amy function 4 on (X, d) to (X’, ad’), Dn 
is a Borel set, since Dx =  (\ Drs where r ands vary over the rationals 


and D,. are the open sets 
Dye = (x: d(x, y) <5, d(x,z) <5, dA), A(z) 2 7}. 


For later use, we observe that except for a change of notation the same 
proof as for 10.1a yields 
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CHANGE OF VARIABLE FORMULA. Let PonSbeapr. Leth be a Borel 
function on X to X' and g be a Borel function on X' to R. The distribution 
Ph of h defined by Ph(A’) = P(h-(4’)), A’ € 8'-Borel field in x! 
determines the distribution of random variables g(h), and 


J cmar-f savirr, 


ff 


in the sense that if either integral exists so does the other one and then both 
are equal. 


The main concepts and results of this section originated with Alex- 
androv and their final form is primarily due to Prohorov. 


*12.1 Convergence. The basic theorem below is essentially due to 
Alexandrov. Any of its six equivalent properties defines weak convergence 


on 8 of pr.’s P, to a pr. P, and we write P,, —> P. The usual definition is 


(i): | gd@P,— | g4P for all bounded continuous functions g. Since 


I = P,(X) > P(X) = 1, this “weak” convergence is in fact complete con- 
vergence. 


A. CONVERGENCE CRITERIA. Let P,, P be pr.’s on the Borel field § 


of a metric space (X, da). Let g be functions on X to R and the integrals be 
over X. 


The following six properties are equivalent and define P,, > P: 


rar. gdP 


(1) for all bounded P-a.e. continuous g 

(11) for all bounded continuous g 
(111) for all bounded uniformly continuous g 
IT: 
(iv) limsup P, C S PC for all closed sets C 
(v) liminf P, U = PU for all open sets U 
(vi) P,A — PA for all P-continuity sets A 


Proof. Clearly (i) > (ii) > (iii). 
(111) => (iv): The function g» defined by gm(«) = e~™4@:©) is bounded 
by 1 and uniformly continuous with Ig S gm | Ic as m— ©. Thus 


PCS f &m 4P,, and, by Fatou—Lebesgue theorem, as 2 — © then m > 0, 


limsup P,C < i) gmdP — PC. 
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(iv) = (v): The two properties are dual: each implies the other one 
by complementation. 

(v) = (v1): Since (iv) and (v) are equivalent, by using both and the 
fact that 4° C A C A, we obtain 

PA®° S liminf P,d° < liminf P,4 S limsup P,4 S limsup P,/4 S 
PA. Since P(A — A°) = P(A) = 0 by hypothesis in (vi), we have 
PA° = PA s0 that in the above inequalities the extreme terms hence all 
the terms coincide and P4 = lim PZ,. 

(v1) = (i): The method of proof of Helly-Bray lemma in 11.5 still 
applies but with another necessary change due to the fact that it is the 
range space, and not the domain, of g which is R. The sets g—(c) = 
{x: g(x) = c} are disjoint for distinct c © R. Since P(X) is finite, it fol- 
lows that P(g—(c)) > O only for a countable set of values of c. Since 
g is bounded there is a bounded interval [a, 4) with g(X) C [a, 4). We 
can take @ = Xm <... < Xmimt =S@D with no xm, € D_ for 
k S km m =1,2,..., and max %m,k+1 — Xmk) 270 asm— o. 


Let Jinx be indicators of the Jmi = g~'[%mky Xm e+1), omit the empty Jme, 
set &mk = inf{ g(x): * © mu}, Zmz = sup{ g(x): ¥ © Iman}, and 


&m = dX SmiL mks £m = 2, Em tl mk 
Since, by (vi), Pa(Jmz) ~ PUJme), it follows that, as —- ~, 
[g dP<— | gmdPn [x dP» — | Em aP, 


while P(D,) = 0, by hypothesis in (i), implies that, as m— ©, P-a.e. 


gm T&L Em 


Therefore, letting 7 — © then m— © in 


[en aPns [dP S [Bn dPn 


[sai g dP. 


we obtain (i): 


The proof is terminated. 


Corotiary l. Jf Pr — P then P,h > Pho for every P-a.e. continu- 


ous h on X to X’, equivalently i) g(h) dP, — | gd(P,h—) for all bounded 
continuous g on X’ to R. 
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For, PH, = 0 and (AC) C (AC) U Dr for every closed C, imply 
P(hA“C) = P(A“'C) hence 

limsup P,(271C) S limsup P,(47C) S P(A“!C) = P(A“C) and, by 
A(iv), Prk “> Ph, The equivalence assertion results at once from the 
change of variable formula by A(ii). 

Corotitary 2. Jf P,— P on © CS& where @ is closed under finite 
intersections and each open set is a countable union of members of ©, then 


P, — P. 


Proof. Let U = U Ax, 4: € ©. By hypothesis, 
k=1 


P,(Ai U A) 
= P,(Aj) +- P,.(A2) —_ P,( A142) —_ P(A) +- P(A) — P(A,A2) 
= P(A, U Ae) 


and, by induction, for every integer m, 


P(A, +++ Am) > P(A, +++ Am): 


m™m 
Since Un = U 4, | U as m—> ~, there is an m =m, such that 
k=l 


PU —eS PUn. Therefore, 
PU —¢€S PU», = lim P,Um S liminf P,U 


and, letting e | 0, 
liminf P,U = PU 


so that 4(v) holds, and P,, — P. 


Coroteary 3. Let X be separable and let Pna— P on © C8. Then 
P, — P if 

(1) © is closed under finite intersections and, given « > 0 and open U, 
for every x € Uthereisan4Z€ Cwithe EC A CACU. 
or 

(11) © consists of those finite intersections of open spheres which are P- 
continuity sets. 


Proof. (i): Since & is separable, given open U, there is a sequence 
(4,) in@ with U = U Aiand 4, C Usothat U = U A, andCorollary 
2 applies. " 
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Note that the second condition on @ in (i) is implied by: for every « 
and every r > 0 there is an 4 € €@ with 


x € A° CAC-S, (r)-open r- sphere about x. 
(ii): Since 0(4B) COA U OB while 08S, (r) C {x: d(x, y) =r} has 


P-measure 0 except for countably many values of r, (i) applies, and the 
proof is terminated. 

*12.2 Regularity and tightness. Since the Borel field of the metric 
space X is generated by the class of open (of closed) sets, it is to be ex- 
pected that a pr. P on § would be determined by its restriction to such 
a class. 


a. REGULARITY LEMMA. Every pr. P on 8 ts regular: given A C8 
and «> 0, there are open U, and closed C, such that 


C.C AC U, and PU. — C.) <6, 
equivalently, 
PA = sup PC = inf PU. 
CCA UDA 


Proof. The equivalence assertion is immediate. To prove the e-asser- 
tion, let © C § be the subclass of those Borel sets for which the assertion 
holds. 

@ contains the class of closed sets C since open U, = {x:d(x,C) <r} | 
Casr | 0. It is clearly closed under complementations. Also it is 
closed under countable unions: Given 4, € © and e > 0, there are 
C,C AC U, with P(U, — C,) < ¢e/2"+!; take U. = YUn and C. = 

UC, with m such that P(UC, — C. < ¢«/2, so that C.C 4 C U, 


nsm 
and P(U.—C.) < «. Thus @ C $S-:is a o-field containing the class of 
closed sets hence @ = 8. 


Coro.iary. The set { J g dP: g bounded uniformly continuous} deter- 


mines P. 


For, the functions gm defined by gm(x) = e7"4@© are bounded and 
uniformly continuous with gn = lon Candgm | Oon C* as m— ©, so 


that | & dP — PC. 
The concept of “tightness” below was named by Le Cam in a memoir 


which followed within a year that of Prohorov and extended the whole 
theory to much more general topological spaces than the metric ones. 
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A family @ of pr.’s on § is said to be tight if for every ¢ > O there is a 
compact K, such that PKe® < «for all P€ @. We say that @ dives on a 
Borel set Xo if P(%) = 1 for all P € @, equivalently, if PA = PAX) for 
every 4 € Sand for all PC &. If © = {P} is a singleton we replace 
above the family @ by the pr. P. Given a Borel set %o, the o-field S)= 
{4: 4 C %, A € 8} is the Borel field of the metric space Xo with its 
relative topology. Thus the above definitions apply to families of pr.’s 
on So C 8. 


b. TIGHTNESS LEMMA. (i) If a pr. P on & is tight then it lives on a 
o-compact Xy and PA = sup PK for every A € 8. 


(11) Conversely, if P on § ‘lives on @ o-compact Xy or if PA = sup. PK 
for every A © 8 then P ts tight. 
(1) Every pr. P on 8 is tight when X is separable and complete. 


Proof. 1°. If P is tight then for every 7 there is a compact K, with 
PK,° < 1/n, so that P(f}K,°) = 0 and P lives on the c-compact Xo = 


UK... Note that %o is separable since compacts in metric spaces are 
separable. 

By a, P is regular so that, given 4 € 8 and e > 0, there is a closed 
CC Awith P(4 — C) < ¢/2. But for ~ sufficiently large, PK,° < ¢/2 
and K, = CK, is compact with K.C CC 4. Since 


P(A—K) S$ PiA-C)+PC-— K) <¢2+ P&—-K,) <e 


and e > 0 is arbitrarily small, it follows that PZ = sup PK, and (i) is 
proved. 
Conversely, if P lives on X%) = UK,, that is, P(UK,) = 1 then, given 


e > 0, thereis an m such that PK.° < ¢« for compact K. = U Kn, and P 


nm 
is tight. This proves the first assertion in (ii) and the second is immedi- 
ate. 


2°. When & is separable then, for every ”, open 1/n-spheres Un, 
Une, +++ coverX. Therefore, given a pr. Pon §$ and e > 0, for k, suffi- 
ciently large PU,° < ¢/2°t! with U, = UY Ux. When moreover & is com- 


kSkn 
plete then the closure K, of the totally bounded set f} U,, is compact. 
Since 


PK& S P(UU,) < Yoe/2"41 = 
P is tight and (iii) is proved. 
A. TIGHTNESS THEOREM. Let the family © of pr.’s on 8 be tight. Then 
(i) lives on a o-compact set Xo and PA = sup PK for every AE 8. 
KCA 
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(1) The family Ch = {Ph—: P © C} is tight for every continuous 
runction h on X to a metric space X’. 


Proof. The proof of (i) is exactly the same as that of b(i);it suffices 
to observe that the compacts K, therein are the same for all P € @. 
Note that, in general, b(ii) does not hold for families @. 

For (11), givene > Othereis acompact K, with PK. < e forall P € @. 
Since A on X to X’ is continuous, Ki. = A(K.) is compact in X’ and 


Ke C A7(K® implies that for all PE © 
Ph(K)* = PAUK)) = PAK) S PKE <, 
and @h7 is tight. The proof is terminated. 


Let So be the o-field of Borel sets on a Borel set X%» CX. Given a 
family ®° of pr.’s on So, its extension to 8 is defined by 


@ = {P:PA = P(AX,), P € ©, AE $}; 
note that PX) = 1 for all PE @, 


Coro.iary. (i) Lf P° 07 8 is tight so is its extension © to 8. 


(ii) Lf Pn° — P® on So then their extensions P, — P on 8. 


For, upon taking 4 to be the (continuous) identity mapping of %p into 
9, A(i1) yields (1) and 12.1A Corollary 1 yields (ii). 


*12.3 Tightnessand relative compactness. We say thata family @ of 
pr.’s on § is relatively compact if every sequence of members of @ con-. 
tains a subsequence which converges weakly to a pr. on 8. Thus “rela- 
tive compactness”’ is, in fact, relative sequential complete compactness. 

Prohorov theorem below is the second basic theorem of this section. 


A. RELATIVE COMPACTNESS CRITERION. Let X be a separable complete 
metric space. Then a family © of pr.’s on its Borel field 8 is relatively com- 
pact if and only tf © is tight. In fact, the “if” part holds for general metric 
spaces X. 


Proof. 1°. Let © be relatively compact. Since X is separable for 
every r > 0, there are open r-spheres Ui, U2, - - - which cover & so that 
V, = U,--- U0, 7 ©. Given e > 0, there is an # such that PV,° < « 
for all P € @: Otherwise, for every 7 there issome P, € @CwithPaV, & 
1 — e and, by relative compactness, the sequence (P,) contains a sub- 
sequence P,, —> some pr. P on §; thus, by 12.1 A(v), for every 7 
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PV, = \iminf Py V, S liminf P,V, S 1—.«, 


while PY, 1 1—contradiction. 


2°. For the “if” part we follow Billingsley who bypasses Prohorov’s 
use of integral representation of linear functionals by hewing closely to 
Halmos’ generation of Borel measures from ‘‘content” to “inner con- 
tent’’ to “outer extension.” The difference is that ‘“‘content’’ is defined 
by Halmos on the class of all compacts while here the corresponding set 
function has the same properties but only on a subclass of compacts. 

For the time being, assume that X is separable so that it has a countable 
base of open spheres U1, Us, - - - ; include X in this base. 

Let @ be tight so that for every ” there is a compact K(m) with 
P(K(n))* < 1/n for all P€ @. Let & consist of all finite unions of sets 
of the form U,,K(z). Thus the class % is countable, closed under finite 
unions, and its members—to be denoted by K with or without affixes, 
are compact. 

Given a sequence (P,,) of members of @, Cantor’s diagonal procedure 


yields a subsequence P,,» > some \ on K. We have to prove that Pp => 
some pr. P on 8. 


Let 
MoU = supdAK, AM = inf AoU, 


KCU UDA 


so that ) 1s defined on &, Ao on the class‘U of open sets U, and \° on the 
class of all subsets. We shall show that the restriction of \° to 8 is pre- 
cisely the pr. P. 

Clearly, X on & is nondecreasing, additive, and subadditive: Ki C 
Ke=)Ki S \Ko, (Ki + Ke) = \Ki + Ke, (Ki U KK.) S NK + AK, 
Ao and A° are nondecreasing, and \® = AyonU. We shall use these prop- 
erties without further comment. 


3°. Ao On U ts o-subadditive: 
Let K C U,U U2 and set 
C; = {x Cc K: d(x, U;,°) = d(x, U2") }, 
C, = {x a K: d(x, U,°) = d(x, U;°)}. 
These closed sets, being contained in compact K, are compact and so are 
CiU\° and C.U2°. If « € CiUi° ¥ @ belongs to U2, then d(x, U,*) = 


0 < d(x, U2") hence « ¥ C,-contradiction. Thus C, C U; and, by defi- 
nition of K, Ci C KiC UW, for some Ky; similarly C.C Ke C Un. 
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Therefore, upon taking the supremum in K in 


\K < (Ki U Ke) S Ky + Ky S Ui + ole, 


we obtain Ag (Ui; U U2) S Ay UU, + AyU2, so that Ay on WU is subadditive 
and, by induction, is finitely subadditive. Now, if K C U U, then, by 
compactness, K CV, = U U, for some m. Therefore, upon taking 


™m 


the supremum in K in 


AK Ss Noh m < > oU,, < > NoU,; 


msn 


we obtain \0(UU,) S 7 boUn so that Ap on U is o-subadditive. 
nr 


For closed C and open U, »U = »UC + UC*: Given e > 0, there 
isa Ki C UC with AK, > A»UC* — ¢€/2, and then there is a K2 C UK,‘ 
with AK, > AUK; — ¢«/2. Since K, and Kz are disjoint and contained 
in U, 

AoU = \CKy + Ko) = \K,+AKe> Ao( UC*) 
+ Ap UK") — € 2 A(UC*) + A(UC) — «€ 


hence, letting « — O, the assertion 1s proved. 


4°. Xo ts an outer measure and Borel sets are \y-measurable: 
Given e > 0 and 4, C & there are U, D A, with AU, < °A, + 
e/2"*1, Since Xo is c-subadditive, 


°(U_4,) < No (U Un) < >; No Un < > NMA, + € 


so that, letting e > 0, \° is o-subadditive. Since \°%is also nondecreasing, 
\° is an outer measure. Furthermore, for closed C and open UD 4, 
upon taking the infimum in U in 


MoU = AUC + AHUC* = AAC) + AAC), 


we obtain 


NA = AC) + (ACY), 


so that closed sets are \°-measurable. Therefore, the Borel field $ (that 
the class of closed sets generates) is contained in the o-field of \°-measur- 
able sets. 


5°. Let P be the restriction of \° to $, so that P on $ is a measure; 
in fact, P is a pr. since 


1 = PX = AL = sup A(K(m)) 2 sup (: — 1) = |. 
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Since for all open U 
PU = NU = sup AK, 
KcU 


upon taking the supremum in K C U in 
\K = lim Pa K S liminf P,,U, 


we obtain 
PU S liminf P,, U. 


Thus, by 12.1 A(v), Pa: —> Pand the “if” partis proved but under the re- 
striction of separability of X. 

Now, let X be a general metric space. By 12.2 A, @ on 8, being tight, 
lives on a o-compact Xo—a separable metric space in its relative topology. 
Thus what precedes applies to the restriction @° of @ to the Borel field 


S) of Xo. But, by 12.2A Corollary (ii), P,-°—> P® on 8 implies P,» > P 
on §. The proof is terminated. 


Corotiary. Let X be separable and complete. Then & on § is rela- 
tively compact if and only if, for every « > 0 and r > 0, there is a finite 
union V,, of r-open spheres with PVn® < «. 


§ 13. CHARACTERISTIC FUNCTIONS AND DISTRIBUTION FUNCTIONS 


Pr. properties are properties describable in terms of distributions— 
and those are set functions. The introduction of d.f.’s makes it pos- 
sible to describe pr. properties in terms of point functions, easier to 
handle with the tools of classical analysis. Yet, to a distribution corre- 
sponds not a single d.f. F but the family of all functions F + ¢ where c 
is an arbitrary constant. The selection of one of them is somewhat 
arbitrary, and we have constantly to bear this fact in mind. The in- 
troduction of characteristic functions (ch.f.) assigned to the family 


F +c by the relation 
f(z) =f dF(x), u€R 


obviates this. dificulty and, moreover, is of the greatest practical im- 
portance for the following reasons. 


1° To the family F + ¢ corresponds a unique ch.f., and conversely. 
Therefore, there is a one-to-one correspondence between distributions 


and ch.f.’s. 


[Sec. 13] | DISTRIBUTION AND CHARACTERISTIC FUNCTIONS 199 


2° The methods and results of classical analysis are particularly 
well suited to the handling of ch.f.’s. In fact, ch.f.’s are continuous 
and uniformly bounded (by 1) functions. Moreover, to complete and 
weak convergence of d.f.’s (defined up to additive constants) corre- 
spond, respectively, ordinary convergence of ch.f.’s and ordinary con- 
vergence of their indefinite integrals. 

3° The oldest and, until recent years, almost the only general 
problem of pr. theory is the “Central Limit Problem,” concerned with 
the asymptotic behavior of d.f.’s of sequences of sums of independent 
r.v.s. Much of Part III will be devoted to this problem. The d.f.’s 
of such sums are obtained by “composition” of the d.f’s of their sum- 
mands, and this “composition” involves repeated integrations and re- 
sults in unwieldly expressions, whereas the ch.f.’s of these sums are 
simply the products of the ch.f.’s of the summands. The Central Limit 
Problem was satisfactorily solved in the 15 years (1925-1940) which 
followed the establishment by P. Lévy of the properties of ch.f.’s. 


13.1 Uniqueness. The characteristic function (chf.) f of a d.f. F is 
defined on R by 


f(u) = f dF (x) = f cos ux dF’(x) + if sin ux dF(x), uC R. 


Since, for every u € R, the function of x with values e*” is continuous 
and bounded by 1, f exists and is continuous and bounded by 1 on R. 
Moreover, to all functions & + c, where ¢ is an arbitrary constant, cor- 
responds the same function f. The converse (and, thus, the one-to- 
one correspondence between distributions and ch.f.’s) follows from the 
formula below. 


A. INVERSION FORMULA, 


1 +U e tua __ eud 
Fla, 6) = lim — i ———_———— f(u) du, 
Usze2r J_y 1Uu 
provided a < 6 are continuity points of F. 
The inversion formula holds for alla < 6 € R, provided © is normalized. . 


We say that Fis normalized if the values of F at its discontinuity points 
F(« — 0) + F@ +0) 
a 
continuity from the left of F at its discontinuity points. However, 
according to 11.1, the normalized d.f. determines the original one, so 
that nothing is lost by normalization. 


x are taken to be Normalization destroys the 
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We observe that, in the integral which figures on the right-hand side 
of the inversion formula, the integrand is defined at wu = 0 by continuity, 
so that it is continuous on R; also it is bounded on R by its value (4 — a) 
f(O) at u=0. Thus, for every finite U, this integral is an ordinary 
Riemann integral and, in proving the inversion formula, we shall find 
that the limit of this integral, as U — ©, exists. 

Proof. The proof uses repeatedly the dominated convergence theo- 
rem applied to an interchange of integrations and is based on the classi- 
cal Dirichlet formula : 

1 ¢? sin» 
-{ du—- 1 as a7 —-w, 6 +0, 


U 


so that the left-hand side is bounded uniformly in a and 4. Let 


1 +U e tua _ e tub 
———_——f(u)du, a<b€CR, 


an —U tu 


Iu 


and replace f(u) by its defining integral i] e'“ dF (x). We can inter- 


change the integrations, so that, by elementary computations, 


Ty = [ Jus) dF), 


1 722-4 sin v 
Ju(*) = -{ dv. 


TY U(x—d) v 


where 


Since Jy is bounded uniformly in U, integration and passage to the 
limit as U — o can be interchanged in 


lim ly = lim Ju) aF (x). 
U— © U — 
Therefore 
Jim Iy =f 70) dF (x) 


where 


p—s 


for a<x<b 
J(*) = lim Ju(*) = 34% for x =a, x=45 
U—- 
0 


for x <a, x >A, 
and, hence, 


jim ty = 3{F(2+ 0) — F(a — 0)} + {F( — 0) — F(a@+0)} 
+3{F(6 + 0) — Fé — 0)} 


_fG-9)+FE6+0) F@—90)+F@+0) 
2 2 
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Thus, if Fis normalized or if a < 6 € C(F), then 
lim Jy = Fla, 4), 
U— 0 


and the inversion formula is proved. 


Remark. If an improper Riemann integral 


exists and is finite, then 


+U -+00 
lim gdx -{ g ax. 
U— 0 —U 


—O 


However, the left-hand side limit may exist and be finite (as in the in- 
version formula), whereas the right-hand side improper integral does 
not exist. Yet the inversion formula can be written in terms of an im- 
proper Riemann integral as follows: 


00 —iwa _ ,—iub 
Fla, b) =~ f Se TODS ay 
0 


T Uu 


where 9 stands for “imaginary part of,” so that 
g{(e~e — e ™) f(u)} = 
(cos ua — cos ub)9f(u) — (sin ua — sin ub)Rf(u). 


+U 0 U 
It suffices to write i] -{ +f , change uw into —uw in the first 
—U —u “0 


right-hand side integral, and take into account the fact that then the 
integrand changes into its complex-conjugate. 


Coro.uary. F 1s differentiable at a and its derivative F'(a) at a is 

given by 
1 +U 4 _ etuh 
(1) F'(a) =\lim lim — ——————— e """*F (u) du 
h>0U7> ee QrJ_y tuh 

if, and only if, the right-hand side extsts. 

In particular, if f is absolutely integrable on R, then F" exists and ts 
bounded and continuous on R and, for every x © R, 


-+00 
(2) F(x) = - iZ e 2 (y) du. 
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Proof. The first assertion follows directly from the inversion formula 
by the definition of the derivative. The second assertion follows from 


the first and from the assumption that f | f| du < © since, the integrand 
in (1) being bounded by Lf | , we have, in (1), 


+U +00 -+00 -+oo 
lim = f and lim = f lim. 
Um aoJ/_y —0 h->0V7_, o 2-0 


Remark. Thus, if the ch.f.’s f, of d.f.’s F’, are uniformly Lebesgue- 
integrable on R, and if f, — f ch.f. of FP, then / is Lebesgue-integrable 
on R, and fF’, — F’. 


B. For every x € R, 
+U 


1 , 
F(« +0) — F(* — 0) = lim — e“*F (u) du. 
U3>o 2U —U 


For we can interchange below the integrations and the passage to the 
limit, so that 


a 1 pty | 
jm a cy ef (u) du = Jim J. du | femomm aF()} 


sin U(y — x) 
= lim | ————— dF() 
U-» U(y — x) 


= F(x + 0) — F(x — 0). 


13.2 Convergences. Since there is a one-to-one correspondence be- 
tween d.f.’s defined up to additive constants and ch.f.’s, it has to be 
expected that a one-to-one correspondence also exists between the weak 
and complete convergence, up to additive constants, of sequences of 
d.f.’s and certain types of convergence—to be found—of ch.f.’s. For 
this purpose we introduce the integral ch f. f of F defined on R by 

; u eiux — ] 
fu) =| #@ ao = { —— are). 
0 1X 
The last integral is obtained upon replacing f(v) by its defining integral 
and noting that the interchange of integrations is permissible. Since 
there is a one-to-one correspondence between / and its continuous deriva- 


tive f, it follows, by 12.1, that there is a one-to-one correspondence 
between / and F defined up to an additive constant. 
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We are now in a position to show that the weak and the complete 
convergence up to additive constants of sequences of d.f.’s correspond 
to the ordinary convergence of the corresponding sequences of integral 
ch.f.’s and of ch.f.’s, respectively. Unless otherwise stated, a d.f., its 
ch.f., and its integral ch.f. will be denoted by F, f, f respectively, with 
the same affixes if any. 


w e eo 
A. WEAK CONVERGENCE CRITERION. If F, — F up to additive con- 
stants, then f, —> f. Conversely, if fn converges to some function $, then 


there exists a df. F with Fr, > .F up to additive constants and f = §. 


UTZ 


Proof. Since ———— — Oasx — Fo, the first assertion follows at 
ix 


once, by the extended Helly-Bray lemma, from the definition of the 
integral ch.f.’s. 

Conversely, let f, — &. According to the weak compactness the- 
orem, there is a d.f. F and a subsequence Fy —> Fasn' —> 0. There- 
fore, by the extended Helly-Bray lemma, for every u € R, 


1uUx 


a . pevt —] e 1 
&(u) = limfn' (4) = lim | —— aFw(x) = { ——aFw) = f(u). 
n’ n! 1X 1x 
Since f determines F up to an additive constant, it follows that weakly 
convergent subsequences of the sequence F, have the same limit F up to 
additive constants, with f = g. This proves the second assertion. 


Corotuary 1. Every sequence fn of integral chf.s is compact in the 
sense of ordinary convergence on R. 


For, in view of the above criterion, this statement is equivalent to 
the weak compactness theorem for d.f.’s. 


Coro.tiary 2. If fn — g a.e., then Fy — F up to additive constants, 
with f = g a.e. 


Here “a.e.”’ is taken with respect to the Lebesgue measure on R. 

Proof. Since fr — g a.e. and the f, are continuous and uniformly 
bounded by 1, it follows that g is measurable and bounded a.e. so that, 
by the dominated convergence theorem, /, — ¢ where ¢ is defined on 
R by the Lebesgue integral 


f(u) = J g(v) dv, “uC R. 
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Therefore, by the foregoing criterion, Fy, —. F up to additive constants, 
and f = ¢. Since the derivative of / is f, whereas that of the indefinite 
Lebesgue integral ¢ exists and equals g a.e., it follows that f = g a.e. 


c e e 
B. CoMPLETE CONVERGENCE CRITERION. If Ff, — F up to additive 
constants, then fy, — f. Conversely, tf fr — g continuous at u = Q, then 


c C) e 
F, — F up to additive constants, and f = g. 


When the F,, and f, are d.f.’s and ch.f.’s of r.v.’s, the converse becomes 
the celebrated P. Lévy’s continuity theorem for ch.f.’s. 


Proof. Let Fn — F up to additive constants. Then, by the Helly- 
Bray theorem, for every u € R, 


fat) = f eitt IP (x) f eit d(x) = flu). 


Conversely, let f, — g continuous at zu = 0. Then, for every u € R, 


fa(u) = J falv) do J e(v) do = &(W), 


and, hence, by the weak convergence criterion, for some d.f. F with ch.f. f, 
F, — F up to additive constants, and f = ¢. Therefore, 


1 ” 1 sr” 
“| £0 ao=— f e(v) do 


and, letting u — 0, we obtain /(0) = g(O) on account of continuity of 
f and of g at the origin. Thus, 


Var F, = fr(0) — gO) = f/(0) = Var F, 
and the proof is completed by taking into account the direct assertion. 


C. UNIFORM CONVERGENCE THEOREM. Jf a sequence fn of chf.’s con- 
verges to a ch. f, then the convergence 1s uniform on every finite interval 


Proof. On account of B, Fy, + F up to additive constants. 
Let « > O and U > 0 be arbitrarily fixed. We have 


b b 
[falu) — flu) |S | fe dae) — fe a(n | 


+ Var fF, — F,[a@, 6) + Var F — Fla, 5) 
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where we take a, 4 to be continuity points of F. Let | a|,4 and then 
n be so large that Var / — Fla, 4) | < 7 

Var F,, — F,[a, 6) < Var F — Fla, 6) + - < > 
It suffices to show that, for 7 sufficiently large and all u € [—U, +U], 


b b 
An = | i) eit? TF (x) -f ei? dF (x) | <3 


Let 
A= x, <M ++ < ena =O 


where the subdivision points are continuity points of fF and 
a = max (%,11 — x%~) < €/8U. Since, by the mean value theorem, 
kSN 


| et — piu’) <|x—-x'|U for |u| SU, 
it follows that, upon replacing « by x, in every interval [x,, 241), An 
is modified by at most 
b b ‘ 

auf dF ',(”) + «Uf dF (x) S$ 2aU < rm 

Thus, it remains to show that, for ” sufficiently large, 
N . 

> ett Falxns Xeoi) — Flxr, xK41)} | 


k=1 


v € 
=> | Falxzs X41) — Fixe, X41) | < 4 
k=1 


Since Fy[%z, %h41) 2 Fl&n, %e41) for every k S N, the last assertion 
follows and the proof is complete. 


Remark. In fact, we proved, with a supplementary detail, the first 
assertion of the complete convergence criterion without using the Helly- 
Bray theorem. 


Corotuary 1. If f, — f and tn — u finite, then fy(tn) — f(z). 
This follows, by C and continuity of /, from 
| fn(ttn) — flu) | S | fa(tn) — fen) | + [fen — £4) |. 
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CoroLiary 2. 4 set {F;} of df.’s is completely compact (up to addi- 
tive constants) if, and only if, the corresponding set { fi} of chf.’s is equi- 
continuous atu = 0. 


Proof. By 13.4B equicontinuity of {/;} at w= 0 is equivalent to 
equicontinuity on R. 

On the other hand, Ascoli’s theorem and its converse say that a set 
of continuous functions is compact in the sense of uniform convergence 
on a finite closed interval if, and only if, it is uniformly bounded and 
equicontinuous on this interval. Since the /; are uniformly bounded, 
the assertion follows by B and C. 


Remark. If the d.f.’s Fn, F of r.v.’s, are differentiable and F,,’ — F’ 
on R, then f/f, — f uniformly on R. It suffices to use 17 in Complements 
and Details of Ch. II. 


13.3 Composition of d.f.’s and multiplication of ch.f.’s. A function 
F on R = (—, +00) is said to be composed of d.f.’s Fy and Fo, and 
written Fy, * Fo, if 


F(x) = f Fi(x — y) dFy(y), # ER 


where we assume, for simplicity, that F\(—2%) = F2(—«) = 0; other- 
wise, to avoid trivial complications, we would have to replace F, by 
Fy — Fi(—). 

Since, for every fixed y, F(x — y) are values of a d.f., nondecreasing, 
continuous from the left and bounded by Fy(—~) = O and Fi(+) < 1, 
it follows, upon applying the dominated convergence theorem, that F 
has the same properties and that Var F = Var F,- Var Fo. 


A. ComposiTION THEOREM. Jf F = Fi % Fo, then f = fifo, and con- 
versely. 


Proof. Let PF = Fy * Fy and let a = xn, <+++< wn441 = 8 with 
sup (Wn,kt1 — *nk) > 0 as 2 — . Since, for every u € R, 


b 
f e“? dF (x) = lim > eid AE Xn,b-41) 
a k 


= lim f ei tnk—D Fe rene — yy Xnkpi — ye dFo(y), 
k 
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it follows that 


J "jis aF (x) = f i) oY ous aF,(3)| e’” dF.(y) 


a—y 


and, letting a - —oands —> +o, 
f et? F(x) = f e'"? JE (x) f e''Y dF4(y), 


so that f = /1 fo and the first assertion is proved. 

Conversely, according to the first assertion, /1/o is the ch.f. of Fy + Fs 
and, hence, on account of the one-to-one correspondence between f and 
F + c¢,F = F,* Fo up to an additive constant. The converse is proved. 


Corotuary 1. J product of chf.’s is a ch f. and, in particular, tf f ts 
a ch.f. so is | f |?. 


For f = fife is the ch.f. of the d.f. F = Fy * Fo, and the particular case 
follows from the fact that, if f is a ch.f., so is its complex-conjugate f 


which corresponds to the d.f. F(-+0) — F(—x + 0). 
CoRoLuary 2. Composition of df.’s is commutative and assoctative. 


For the corresponding multiplication of ch.f.’s has these properties. 

13.4 Elementary properties of ch.f.’s and first applications. In the 
sequel, the elementary properties we establish now will play an impor- 
tant ancillary role, and the first applications will be used, improved, 
and generalized. 

We denote by F and /f, with same subscripts if any, corresponding 
d.f.’s and ch.f.’s; in general, the corresponding d.f.’s F are defined up to 
additive constants, but if f is ch.f. of a r.v., then, as usual, we take 
F(—o) = 0, F(+0) = 1. We say that ar.v. X is symmetric if X and 
—X have the same d.f., that is, for every x € R, P[X < x] = P[X > 
— x]. 


A. GENERAL PROPERTIES. Every ch. f is uniformly continuous and 
|f| S/O) = VarF $1, f(-u) = fiw). 


If f is the chf. of a rv. X, then the function with values e™%f(bu) is the 
chf. of the rv. a+ bX. In particular, f is the ch. of —X and f is real 
if, and only if, X 1s symmetric. 


Elementary inequality: f(0) — Rf(2u) S 4(f(0) ~ Rf(u)). 
Proof. The first assertion follows from f(u) = f e'’? dF(x). The sec- 


ond assertion follows from Ee™(e+0X) = gtuapeiuX Finally, if X is 
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symmetric, then f(u) = Ee™“* = Ke~“* = f(—u) = f(u) so that f is 
real; conversely, if f is real, then changing the signs of a and 4 in the 
inversion formula is equivalent to taking the complex-conjugate of the 
integrand and changing its sign, so that Fla, 4) = F[—4, —a) and, 
hence, by letting @ — —o and d7 x, we have P[X < x] = F(x) = 
— F(—x +0) = PIX > —<x]. 

The elementary inequality obtains upon integrating 1 — cos 2ux S 
4(1 — cos ux) with respect to F. 


B. INCREMENTS INEQUALITY: for any u,h CR 


| flu) — f(a + A) |? S$ 2F(0){ FO) — RS(A)}. 


INTEGRAL INEQUALITY: for u > 0 there exist functions 0 < m(u) < 
M(u) < © such that 
2 


ma) | (700) — 6f(0)} dv s [——, 
0 


dF (x) 


SMW) { {f0) — af) ae 
0 


if f(O) = 1, then, for u sufficiently close to 0, 


x? U 
iF 12 aF (x) Ss — Mw) J (log Rf(v)) db. 


Proof. The increments inequality follows, by Schwarz’s inequality, 
from 


2 
| f(u) — flu + A) |? = | fewa — e") dF (x) 


sf ary [| 1 — et |? dF (x) 


2f(0) f (1 — cos hx) dF (x) 
2f(0){fO) — Rf(A)}. 


The integral inequality follows, by the elementary inequality with 
u x0 > 
sin ux\ 1+ x 
0<M (4) S| a (1 — =) 5 


Ux 


<m"(u)<o, «CR, 


from 


u . 2 2 
f dv | (1 — cos ox) dF(x) = f (1 ~ =) 1+# .—"__ dF (x), 
0 


ux x 1+ x? 
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The case {(0) = 1 follows then from the elementary inequality 1 — @ 
< —logafora 2 0. 


The integral inequality permits us in turn to find bounds for 


{ x? dF(«) and dF (x), (¢ > 0), by 
|a|<c 


|x|2Ze 


2 


1 C 
I —— { ° dF —[ dF 
©) 1+¢ lel<e a ri: | z|Ze ) 
2 


< | 4Pw) 


1+ x? 


J. | a ale) +f, [2c ae). 


However, it is sometimes more convenient to use the direct 


B’. TRUNCATION INEQUALITY: for u > 0: 


3 
f ? dF(x) S = {fO) — af}, 
[2 |<1l/u u 
[are sf @ - af} 
| e|21/u ~ undo 
If f(0) = 1 and u is sufficiently close to 0, then we can replace 1 — Gf in 


the foregoing by — log Gf. 


These inequalities follow, respectively, from 


2,2 2,2 
fa — cos ux) dF (x) =| ae (1 _ ~) dF (x) 
lx|<i/u 2 12 


11u? 
=— x? dF (x) 
24 Jl 2|<l/u 
and from 
1 sr” sin ux 
-{ dv | (1 — cos ox) dF (x) =| 1 — dF'(x) 
uJo Ux 


> (1 — sin 1) f dF(x). 
[x|21/u 
The case f(0) = 1 follows as in B. 
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i?) 


Applications. 1 
on R. 


Lf fn —> g& continuous at u = OQ, then g ts continuous 


This follows from the fact that the increments inequality with /, be- 
comes, as 7 —> ©, the same inequality with g. 


2° If the sequence fy, is equicontinuous at u =, then it is equicon- 
tinuous at every u CR. 


For, then, as 4 — 0, 
| fal) — fa(u + 4)? $ 2{fn(0) — Bfn(A)} > 0 
uniformly in x. 
3° If fn — lon (—U, +0), then f, — 1 on R. 
This follows by induction as f,(2u) — 1 for | u| < U follows from 
| fn) — fn(2u) |? S$ 2{fr(0) — @fr(u)} 20 for |u| < U. 


If we take into account the fact that the set of all differences of num- 
bers belonging to a set of positive Lebesgue measure contains a non- 
degenerate interval (—U, +U), this proposition can be improved as 
follows: 


Lf fn — 1 0n a set A of positive Lebesgue measure, then fy — 1 on R. 


For, we can assume that the set 4 is symmetric with respect to the 
origin and contains it, since, for u € 4, 


fr(—u) =frlv) 21, 12f,0) 2 (fA) | > 1, 
and, then, f,(u — u’) — 1 for u, u’ € A on account of 
| fn(u) — fale — u') |? S 2 fn) — Afr(—#')} > 0. 


4° We shall now prove an elegant proposition (slightly completed) 
due to Kawata and Ugakawa. We use repeatedly Corollary 2 of the 
weak convergence criterion which says that, if a sequence of ch.f.’s 


&n — g£ a.e., then the corresponding sequence of d.f.’s G, = G up to 
additive constants and the ch.f. of G coincides a.e. with g. 


Let & = fe > g ae. Either g =0 a.e., and then G, — 0 up to 
k=1 


additive constants. Or g #0 on a set A of positive Lebesgue measure, 
c ° e 
and then G, — G up to additive constants. 


[Sec. 13] DISTRIBUTION AND CHARACTERISTIC FUNCTIONS 211 


Proof. In both cases Gp —> G up to additive constants. The first 
case follows from the recalled proposition. In the second case, we have 
to prove that Var G, — Var G. Since Var #* = Var (/'* F) = (VarF)? 


and f* =| f|?, it suffices to consider real-valued nonnegative ch.f.’s. 
m 


But then lim J] f; exists on R and coincides a.e. with a ch.f., while, 
mo nt 


for m, n sufficiently large, 2mgn *% 0 a.e.on 4, and, as m — © and then 
n— @, 


™ ive) 
Il fe = Sm/2n — 8/en = IL fe 71 ae. on &. 
k=n-+1 k=n+1 


It follows, by 3°, that [] f/f, ~ 1 a.e.on R. Therefore, if Hy is the 


k=n+1 re) 
d.f. whose ch.f. coincides a.e. with JJ /;, then Var H, — 1. But, by 
k=n-+1 


11.2a and the composition theorem 13.3A, 
lim inf Var G, = Var G = Var G,: Var H,,. 


It follows, by letting » — , that VarG, — VarG. The proof is 
completed. 


5° nearer "sk = 1,-+++, kn 2 ®, a — fink). 


Set ¥n(*) = 2 cr Tp ye hae) and a(¢)_= oP ps ms dF yu(*), 


else 
B(c) = sup > f | x an c > O finite. 
n k xi <c 


Tf fn = [16 fnz with fnz real-valued, then the following properties are equtv- 
alent: 


(C1) the sequence F,, is completely compact. 

(Co) the sequence Yn is equicontinuous atu = 0. 

(C3) a(c) — 0 as ¢ > @& and alc) + B(c) < © for every (some) c. 
(C4) the sequence WV, is bounded and completely compact. 


Proof. (C,) = (C2) by 13.2C Cor. 2 and the inequality 1 — a, S 
II(1 — a) S exp {—Yiax,},0 <a, <1. (Cy) = (C3) by B’ and (C3) 
=> (Ce) by yn(u) < 2a(c) + B(c)u?/2. Finally, (C3) < (Cs) and “‘some 


0” > “every c” by (I), a(c)e2/(1 + 2) < dVn(~) < a(c) and 
|z| 2c 
11.2B. 
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Let 
mo = | xt dF), wo = [|x| aPC), k =0,1,2,--5, r 2 0, 


be, respectively, the kth moments and the rth absolute moments of F. 
Let f™ be the &th derivative of f(f© =f) and, as usual, let 6, 6’ be 
quantities with modulus bounded by 1. 


C. DIFFERENTIABILITY PRopERTIES. Jf f?™(0) exists and is finite, 
then wp < © forr S 2n. 
Tf pt) < for a 6 = O, then for everyk <n 
f(u) = i* f ett ek F(x), uC R, 


and f (k) fs continuous and bounded by u™ > moreover 


n—l , (iu)* 
fu) = Tm —— + p,(u), «CR 

k=0 k! 

where 
1 1—? n—l1 *1,\n n 
pn(u) = 4" f CaP fu dt = m™ Go" tou") = on™ | 
0 (z—1)! n!| n!| 
and if0 <6 S 1, then 
4,\N n-+6 
pr(u) = m™ vw) + 21-86’ yet) | 


n'| (1 + 6)(2 + 8) --+ (7+ 8) 

Proof. To begin with, we observe that, since | « |” < 1+ |x|" for 
r' <r, finiteness of uw“ implies that of p. 

The first assertion follows from the existence and finiteness of the 
2nth symmetric derivative by using the Fatou-Lebesgue theorem in 


‘ h 2n 
| f°” 0) | = lim f (= *) x?" dF (x) = f x2” dF (x). 
h—0 x 


The second assertion follows from the fact that, by differentiating 
f e*“* JF (x) k times under the integral sign, the integral so obtained is 


absolutely convergent and, hence, this differentiation and the integration 
can be interchanged. 

The limited expansions follow by integrating the limited expansions 
of e“* with corresponding forms of its remainder term. The last and 
less usual corresponding form of its remainder is obtained upon observ- 
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ing that |e” —1| <2] 2/2|° (since, forO < 6 $1, if|a/2| < 1, then 
—1|<|a| <2|a/2|° and, if | a/2| 2 1, ‘then | — 1| <2< 
2 | a/2 a and using successive integrations by parts in 


(1 — 4)" 

1—6 ) 6 
= 2 ju Pf a 
_ 2-4 ux |? | 
A+ 82+ 8) ++ +8) 


Corotiary. Jf all moments of F exist and are finite, then f™ (0) = 
km) for every k, and 


1—f  ( 
ada eitux _ — 1) dt 
(n — ‘(an — 1)! 


: nr 
(n) Gu)" 
n\ 


f(4) = 2am 


n=0 
in the interval of convergence of the series. 
Applications. We consider d.f.’s F and ch.f.’s f of r.v.’s X, with the 
same subscripts if any. If m@) = EX =0, we write o* instead of 
m?) = BEX?, 
1° NorMAL DISTRIBUTION. A “reduced normal’  d.f. is defined by 
F'(x) =e” 121° On. It is the d.f. of a r.v., since m = 1 by 


(—= few?as) (—= fe ~?ay) - — {fe —@+y?)/2 7,, dy 
T T 

1 _ap 
— af Pr“p ap = 1 
On 0 


Since f’(—x) = F’(x), it follows at once that the odd moments vanish, 
while, by integration by parts, we obtain 


mn) = (2n — 1)m@n—?) =... = (2n)!/2"n!. 
Therefore, by the foregoing corollary, the “reduced normal” ch.f. 1s 
00 — 47 q)\n 
fl) = Se” m = ev? ue R. 
n=0 n. 


2° Bounpep Liapounov THEOREM. Let | X,| Sc < © and 
EXn = 90. 


nr n 
If sn? = Yo,” 3 &, then TI fe(u/sn) el? for every u CR. 
k=1 k=1 
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Since E| Xn (3 < cEX,? and o,” = EX,” S c’*, it follows, upon fixing 
uw arbitrarily, that 


2 cor 
j(“)=1- tem i |u[> > 1 


Sn 25n 65n° 


uniformly ink S ”. Therefore, for 7 sufficiently large, 


2 


3 u 
(1+ o0(1)) > - = 


cl | 


n Uu un 
» log fr (“) = ——(1+ o(1)) + 4 
k=1 In 2 


n 


and the assertion is proved. 


814. PROBABILITY LAWS AND TYPES OF LAWS 


14.1 Laws and types; the degenerate type. Since there is a one-to- 
one correspondence between distributions, d.f.’s defined up to an additive 
constant, and ch.f.’s, they are different but equivalent “representations” 
of the same mathematical concept which we shall call pr. /aw or, simply, 
law. Moreover, to a given distribution on the Borel field @ we can 
always make correspond the finite part of a measurable function X on 
some pr. space (Q, @, P), and the restriction of P to X~*(@) with 
values PLX € S], S € B, is still another representation of the law defined 
by the given distribution; there are many such measurable functions 
and many such spaces. Nevertheless, the various representations of a 
given law have their own intuitive value. Thus, for every law we have 
a multiplicity of representations and we shall use them according to 
convenience. 

A law will be denoted by the symbol £, with the same affixes if any 
as the d.f. or the ch.f. which represents this law, and the terminology 
and notations for operations on laws will be those introduced for d.f.’s; 
in particular, if F, —, F we write Ln aus £&, and if F, > F we write 
&n — &. The case of laws of r.v.’s (with d.f.’s of variation 1) is by far 
the most important. The law of ar.v. X will be denoted by £(X), and 
if a sequence £(X,,) of laws of r.v.’s converges completely—necessarily 
to the law £(X) of a r.v. X—we shall drop “complete” and write 
L(Xn) — £(X). From now on a law will be law of ar.v., unless otherwise 
stated. 

The origin and the scale of values of measured quantities, say a r.v. 
X, are more or less arbitrarily chosen. By modifying them we modify 
linearly the results of measurements, that is, we replace X by a + 6X 
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where a and 4 > O are finite numbers. If, moreover, the orientation of 
values can be modified, then the only restriction on the finite numbers 4 
and dis that 0. This leads us to assign to a law £(X) the family 
5(X) = {£(a + bX)} of all laws obtainable by changes of origin, scale, 
and orientation, to be called a type of laws. If 4 is restricted either to 
positive or to negative values, the corresponding families of laws will 
be called positive, resp. negative types of laws. 

Letting  — O we encounter a boundary case—the simplest and at 
the same time the everywhere pervading degenerate type {£(a)} of laws 
of r.v.’s which degenerate at some arbitrary but finite value a, that 1s, 
such that X = aa.s. The corresponding family of “degenerate” d.f.’s 
is that of d.f.’s with one, and only one, point of increase 2 € R with 
F(a + 0) — F(a — 0) = 1. The corresponding family of “degenerate” 
ch.f.’s is that of all ch.f.’s of the form f(u) = e™*, u © R, so that their 
moduli reduce to 1. The converse is also true and, more precisely, 


a. A ch f. is degenerate if, and only if, its modulus equals 1 for two 
values h #0 and ah #0 of the argument whose ratio a is irrational. In 
particular, a ch.f.f is degenerate if | f(u) | = 1 in a nondegenerate interval. 


Proof. Since | F(A) | = 1, there is a finite number a such that f(A) = 
e"@ and, hence, 


e hath) = f ethlz—2) JF (x) = 1. 
Thus 
fu — cos h(« — a)] dF(x) = 0 


and, since the integrand is nonnegative, it follows that, for points ¥ of 
; Qr 

increase of F, cos A(¥ — a) = 1 so that X%’ — ¥”’ is a multiple of - when 
the points of increase ¥’, %’’ are distinct. Replacing h by ah, we find 


; , dr 4..,.. , _ 
that x’ — x” is also a multiple of OR which is impossible when a is 1r- 
4 


rational unless there is only one point of increase. The particular case 
follows. 


Remark. The foregoing argument proves that, if | F(A) | = ] for an 
hx#0, then f(u) = > pee, u€ R, where py 20, >) px = 1 and 
k=0 k=0 


v oo . 
My, =atk- 7 ; the converse is immediate. 
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14.2 Convergence of types. If £(X,) — L£(X), then, for every a, 
b #0, &(a+ bX,) — &(a+ bX), sincef, — f implies that ef, (bu) 
e*#(bu), u € R. Thus, we may say that convergence of sequences of 
laws to a law is, in fact, convergence of sequences of types to a type. 
It may even happen that, given a sequence £(X,) convergent or not, we 
can proceed to changes of origin and of scale varying with ” and giving 
rise to a convergent sequence £(4, + 4,X,). In the particular case 
of consecutive sums X, of “independent” r.v.’s, a special form of the 
problem of finding the sequences of laws which converge for given 
changes of origin and of scale is the oldest and, until recently, was the 
only limit. problem of pr. theory; we shall investigate it in Part III. 
Meanwhile there is an immediate question to answer: given a sequence 
£&(X,,) of laws, do all the limit laws of convergent sequences of the form 
L&(an + b,Xp) belong to a same type? The answer, due to Khintchine 
for positive types, is.as follows: 


A. CONVERGENCE OF TYPES THEOREM. Jf £(X,) — &£(X) nondegen- 
erate and &(an + bnXn) — &(X’) nondegenerate, then the laws &(X) and 
£&(X’) belong to the same type. More precisely, £(X') = L(a + bX) with 
| bn | — | 2|, and if bn > 0 then bn — 0, Gn — a. 


However, for every finite a and for every sequence &(Xn) of laws, there 
exist numbers Gn, and by, 0 such that &(an + bynXn) — L&(a). 


In other words, given a sequence of laws, the changes of origin, scale, 
and orientation can yield in the limit no more than one nondegenerate 
type and can always yield in the limit the degenerate type. This shows 
once more that the degenerate type is to be considered as the “‘degen- 
erate part” of every type. 


Proof. The second assertion is immediate. For, by taking the num- 


1 
bers cn sufficiently large so as to have P[| X,| = en] < — — 0, we obtain 
n 


° Xn Xn 
and, it follows at once, that &£{——] — £&(0), so that £{ a+ — 
} Nn Cy 
— La). 
The first assertion means that f, —/ nondegenerate and ef, (b,u) 
— f'(u) nondegenerate, uw € R, imply existence of two finite numbers a 
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and b #0 such that f’(u) = e%f(du), u CR. We can always select 
from the sequence J, a convergent subsequence 4,,, but its limit may 
be 0 or kw. If 5 = 0, then, since the convergence of ch.f.’s to a ch.f. 
is uniform in every finite interval, we have, for every fixed u € R, 


|) | = im | far nee) | = [70) | = 1, 


so that, by 14.1a,_/’ is degenerate and this contradicts the assumption. 
Similarly, if d,, —- -:, then, replacing u by 7 , it follows that 
n’ 


ul 


/e)| = tim Lf (“) | = | = 1 


by 


so that f is degenerate and this contradicts the assumption. Thus 
b,: —> & finite and different from 0. On the other hand, for all w sufh- 
ciently close to 0, the continuous functions f(z) and /’(z) (with values 
1 for u = 0) differ from 0; and we have, for 7’ sufficiently large, 


oy 8 fucldut) fw 


= —___—_———_- — 


Tint (Ont) F (du) 


so that lim e*@" exists and is finite for |u| S some uw > 0. But then 
limsup |a@,-|< ©. Therefore, for any convergent subsequences of 
(an), @ — a’ and a‘’— a", we have eiu(an—an) —» giu(a’—e’’) — J for 
|u| Sm. It follows, by 13.4 Application 3°, that 2’ — a” = 0 hence 
anr—someae Rand f'(u) = e#f(du),u ce R. 

Clearly, it remains only to prove that |4,| —> |4|. Let d,, > 4 
and bn —> b' hence an, — @ and dy — a’; it suffices to prove that 
if, for every u, e™*f(bu) = e* f(d'u), then | b | = |’ |. Upon replac- 


#0, n'’ > ow, 


; b ; ; 
ing b’u by uw and zy by c, it suffices to prove that, if | ¢| <1 and, for 


every u, | f(u) |? = | f(cw) |?, then |c| =1. But | c| <1 entails, upon 
replacing repeatedly uw by cu, 


| f(w) |? = | flew) |? =- ++ = lim | f(c*w) |? = 1. 
Thus, the nondegeneracy assumption excludes the possibility le] <1, 
so that | c| = 1 and the proof is complete. 


Remark. It is immediately seen that if we limit ourselves to, say, 
positive types only, then, under the foregoing assumptions, @, — 4@ 
and b, — b. We leave to the reader to find conditions under which 
this property remains valid for types. 
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Corotuary. If, for every u, 
ef (dau) —> flu) and f(b’ nu) — f(u) 
where f is a nondegenerate ch. and bnb', > 0 for every n, then 


bn 
— 0 and —-— 1. 


Replace in the theorem X, by @’n + b'nXn. 


14.3 Extensions. The results and terminology of this chapter ex- 
tend at once to families of r.v.’s, and we shall content ourselves with a 
few generalities. 

The Jaw of a random vector X = {X1, +--+, Xy} with d.f. Fy on R® 
is represented by the ch.f. fx on R® defined by the N-uple integral 


fx(u) = [ e dF x(x), ux = ux, +---+ unen 


or, explicitly, by 


N-uple 
et aN, 


Fx(u, ‘+, un) = io Ce d\dy:+-dnFx(%1, +++, ¥y). 


The integral which appears in the inversion formula becomes an N-uple 


+U, +UnN etua _ e7 tub 
Riemann-Stieltjes integral oc and the “kernel?” ————_———— 
—U; —Un 1u 
N .—turak —iurdk 
becomes [[ ————————-. 
k=1 tuk, 


We observe that there is a one-to-one correspondence between the 
law of the random vector X = {.Xj, ---, Xy} and the laws of the r.v.’s 
uX = u,X%,+-+:+ uyXn, where u varies over R™, since 


fx(tu) = fux(), tECR 


and, in particular, fx(u) = fux(1). 
Finally, the Jaw of a random function X = {X,,t€ T} is the set of 
joint laws of all its finite subfamilies. 


$15. NONNEGATIVE-DEFINITENESS; REGULARITY 


15.1 Ch.f.’s and nonnegative-definiteness. The class of ch.f.’s has 
been defined to be the class of Fourier-Stieltjes transforms of d.f.’s. 
Conversely, given a continuous function g on R, we can recognize 


[Sec. 15] DISTRIBUTION AND CHARACTERISTIC FUNCTIONS 219 


whether or not it is a ch.f. by applying the inversion formula: if the 
right-hand side of the inversion formula exists and is nonnegative for 
all pairs @ < 4 of finite numbers, then g is a ch.f. up to a multiplicative 
constant. If g is absolutely integrable on R, then it suffices to apply 
Corollary 1 of the inversion formula and verify that the function F’” is 
nonnegative. A very important criterion of a different type is that of 
nonnegative-definiteness that we investigate now. 

Let g be a real or complex-valued function on a set Dg C R obtained 
by forming all differences of the elements of a set S ¥ 6; for example, 
S = [0, U) and Ds = (—U, +U), S = set of all positive integers and 
Ds = set of allintegers. Sets Ds are necessarily symmetric with respect 
to the origin u = O and contain it. We say that g on Ds is monnegative- 
definite 1f for every finite set S, C S and every real or complex-valued 
function 4 on 8, 


dX g(u — v)h(u)A(v) 2 0; 
u,v € Sp 


we shall omit mention of Ds when Ds = R. 
a. If g on Dg is nonnegative-definite, then, for every u € Ds, 


g(0) 20, g(—xu) =Z(u), | g(u)| S g(0). 


If, moreover, Dg D (—U, +U) and g is continuous at the origin, then g 
1s uniformly continuous on the set of limit points of Ds. 


Proof. We apply the defining relation with 
5S; = {0}, So = {0,4}, Sg = {0, x, u'}. 


With S,; we obtain g(0) 2 0. It follows with Sq that g(u)Ak(u) + 
g(—x)h(u) is real and hence g(—u) = Z(u) (take A(u) = 1 and A(x) 
= 1). We use these two properties below. 

The discriminant of a nonnegative quadratic form being nonnegative, 
elementary computations with Sz yield | g(u)| S$ g(0). For the last 
assertion we exclude the trivial case g(0) = 0 which implies ¢ = 0, and, 
to simplify the writing, assume that g(O) = 1 (it suffices to replace g 
by g/zg(0)). The same discriminant property but with S3 yields, by 
elementary computations, 


| g(u) — g(u’) |? $1 — | g(u — a’) |? — 2W{F(w)g(u')(1 — g(u — x’'))} 


Therefore, if g is continuous at the origin, that is, if g(u — xu’) — g(0) 
= las u’ — u, then g(u’) — g(u). The proof is complete. 
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The foregoing proposition shows that a nonnegative-definite func- 
tion g on R continuous at the origin has properties similar to those of 
ch.f.’s. In fact, g coincides on R with a ch.f.—up to a multiplicative 
constant; and this is what we intend to prove now. According to a, 
if g(0) = 0, then g = 0 so that, by excluding this trivial case and di- 
viding by g(0) we can and will assume from now on that g(0) = 


b. HERGLoTz LEMMA. J function g on the set Dg = {+--+ —2c, —c, 


QO, +e, +2c, ---} 25 nonnegative-definite if, and only if, it coincides on this 
tric 


set with a chf. flu) = J et® JF (x), 


—7/Cc 


Proof. We can assume that ¢c >0. If g on Dg is nonnegative-defi- 
nite, then, for every integer 7 and every finite number x, 


1 1! k 
Gi) =— > (1 — i) g(ke)e—** 


= Foy hh GY — Bederi* 2 O. 
TH j=l 


Upon multiplying by e“* with some fixed value of & and integrating 
over [—7, +7), we obtain 


| &| tr tale 
(1 — — g(ke) = f e'**G' (x) dx = f ei koz JE (x) 
nN —T —t/ec 
where F,, is a d.f. with F,(—a/c) = 0, Fa(+a/c) = (0) = 1. The 
“only if” assertion follows, on account of the weak compactness and 


Helly-Bray lemma, by letting 7 — © along a suitable subsequence 
of integers. The “if” assertion is immediate (as below). 


A. BocHNER’s THEOREM. J function g on R is nonnegative-definite 
and continuous if, and only if, it is a chf. 


Proof. The “if” assertion (Mathias) is immediate, since, if g is a 
ch.f. with d.f. G, then, letting wu and v range over an arbitrary but finite 
set in R, 


Eau — DADA) = f {De ™HHHW)) dG 


=| > e*h(u) |? dG(x) = 0. 
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Conversely, let g on R be nonnegative-definite and continuous. It co- 
incides on R with a ch.f. if it does so on the set S, (dense in R) of all 
rationals of the form k/2", k = 0, +1, 42, ---, 2 =1, 2, --+. For 
every integer 7, let S,, be the corresponding subset of all rationals of 
the form k/2” so that S, 7 S,. Since g is nonnegative-definite on R, it 
is nonnegative-definite on every S,. Therefore, by b, there exist ch.f.’s 
Jn such that g(k/2") = fr(k/2") whatever be k and . Since S, 1 S,, 
it follows that f, — gon S, Let 0 < 6,6, < 1, so that, by b, 


+91 
1 — &f,(6/2”) f (1 — cos 0x) dF,(2"x) 


+7 
<{ (1 — cos x) dF,(2"*) = 1 — Rg(1/2"). 


Therefore, by the elementary inequality | ¢ + 4|? S 2| 4|? + 2| 4|? and 
the increments inequality, for every fixed 4 = (kn + 6n)/2", 


| 1 —fn(A) |? S 2) 1 —fa(kn/2”) |? + 4(1 — @fn(On/2”)) 
< 2] 1 — g(kn/2") | + 4(1 — @g(1/2")). 


Since g is continuous at the origin, it follows by 13.4, 2°, that the se- 
quence f/f, of ch.f.’s is equicontinuous. Hence, by Ascoli’s theorem, it 
contains a subsequence converging to a continuous function f, so that 
g =f on S, and hence on R. Since by the continuity theorem f is a 
ch.f., the proof is complete. 

The “only if” assertion can be proved directly, and this direct proof 
will extend to a more general case: For every T > Oandx € R 


1 ¢7 7? , 
pri) = a f g(u — ve" du dv = 0, 
TJ Yo 


since, g on R being nonnegative-definite and continuous, the integral 
can be written as a limit of nonnegative Riemann sums. Let u = v +4, 
. . f 

integrate first with respect to v and set g(t) = (1 — i) g(7) or O ac- 


cording as | t | = Tor | t | = T. The above relation becomes 
pr(x) = f e~“*en(t) dt = 0. 


; , . 
Now multiply both sides by ae (: — Le) e’“? and integrate with re- 
Tv 
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spect to x on (—X, +X). The relation becomes 


1 f-( | . (eyeing 1 pin? 3X(¢ — x) 0 a 
— - — x x= — | —— . 
on J_x x / Pre ond ixe—u? ** 


The left-hand side is a ch.f. (since its integrand is a product of e* by 
a nonnegative function) and the right-hand side converges to gr(u) as 
X — , Therefore, gr is the limit of a sequence of ch.f.’s. Since it is 
continuous at the origin, the continuity theorem applies and gr is a 
ch.f. Since gr — g as T — ©, the same theorem applies, and the as- 
sertion is proved. 


*Extension 1. The question arises whether in A continuity at the 
origin is necessary. Let g on R be nonnegative-definite and Lebesgue- 
measurable. 

By integrating 


dX g(uj — Uz)e i“ 20, «CR 
Uj,Utk n 


with respect to every u € S, over (0, T), we obtain 
T pT . 
nT” + n(n — yr i) g(u — vje’%—® dy dv = 0. 
0 Yo 


Dividing by n(x — 1)T”~ and letting 2 — ©, it follows that 


T pT | 
i) f g(u — v)e“—)® dy dv = 0. 
0 YO 


Therefore, the direct proof of the “only if” assertion in A continues to 
apply, but instead of the continuity theorem use 12.2A Corollary 2, and 
we obtain g =/ ch.f. almost everywhere (in Lebesgue measure). The 
“if”? assertion is modified accordingly. Thus (F. Riesz) 


A’. A function g on R is nonnegative-definite and Lebesgue-measurable 
if, and only if, it coincides a.e. with a chf. 


*Fxtension 2. It can be shown that Herglotz lemma remains valid 
with Dg = {—Ne, —(N — l)e, ---, 0, --- (NV — 1)c, Ne} whatever be 
the fixed integer N. Then, replacing S, and S, by their intersections 
with (— U, +U) whatever be the fixed U, the proof of A remains valid. 
Thus (Krein) 


A’, A function g on (—U, +U) is nonnegative-definite and continu- 
ous if, and only if, it coincides on (—U, +U) with a chf. 
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Remark 1. The proofs of A and A” use only the fact that g is con- 
tinuous at the origin, so that these theorems imply the last assertion 
In a. 

Remark 2. The foregoing proofs show that in the definition of a 
nonnegative-definite g it suffices to take A(u) = e** where « runs over 
R. Also if g is Lebesgue-measurable, then the definition can be taken to 


be 
Tr Tr 
i) f g(u — vet“? du dy = 0 
0 Yo 


for every « © R and a sequence T, — ©. 


According to the second extension, a function which coincides with a 
ch.f. on (— U, + U) can be extended to a ch.f.on R. The problem which 
arises is under what conditions this extension is unique. This is part 
of the problem we investigate in the following subsection. 


*15.2 Regularity and extension of ch.f.’s. According to 14.la, if 
f = 1 on an interval (— U, + U), then f = 1 on R. Also according to 
13.4, 3°, if f, > 1 on (—U, +U) then f,— 1 on R. Thus, in these 
cases a ch.f. is determined by its values on an interval, and convergence 
of a sequence of ch.f.’s on R follows from its convergence on an interval. 
We intend to investigate more general conditions under which these 
properties hold. To simplify the writing, we assume that the ch.f.’s 
are those of r.v.’s, that is, take the value 1 at uw = 0. 


a. If f is the integral chf. corresponding to the ch.f. f, then 


f hy —flu-—A)? 1 
flu + 4) — fe) S51 + af@)}. 


2h 
For, from 
x x 
9 sin? 2— sin? — 
sin* x 2 2 o* 1 + cos * 
= = cos* - = —————» 
2 2 


it follows, upon applying the Schwarz inequality, that 


4 _ aye 
f(u + A) Ku h) _ [fen — Ax dF (x) 
x 


2h 
1 h 1 
sf are — 
2 2 


2 


{1 + Rf()}. 
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We extend now the uniform convergence theorem 13.2C. Let f/, be 


ch.f.’s. 


b. If fn — g on (—U, +U) and g ts continuous at u = 0, then the 
f,, are equicontinuous and the convergence ts uniform. 


Proof. Because of 13.4 (1°,2°) and Ascoli’s theorem, it suffices to prove 
that the f, are equicontinuous at uw = 0. If this conclusion is not true, 
then there exist an e > 0, a sequence 7’ — o, and a sequence up — 0, 


such that | fn’(un) | < 1 — e for all 7’; given a positive A € (—U, + UV), 


h ; ; 
we take my, = 7, |? 8° that myttn, — h. Upon applying a with 
u = kh and summing over k= —m+1, -—m-+ 3, :: — 1, we 


obtain by the elementary inequality | ay pe. -+ an 2 < < mal a ay 4 
+ m| am |? 


(mh) — 1 
<-l Rf(A)}. 
| omh =5 {1+ af(Z)} 
It follows that 
1 Fin Un! 2 1 € 
| i Fui(v) dv| S- {1+ @fni(an)} <1 —-—- 
QMniUnt J—mnun 2 2 


and, letting 2’ —> ©, we have 


Since 1 = f,(0) — g(0) and g is continuous at u = 0, it follows, letting 
h— 0, thatl1 $1 —- 5 . Therefore, ad contrario, the f, are equicon- 


tinuous at wz = 0, and the assertion is proved. 


A. CoNnTINUITY THEOREM ON AN INTERVAL. If f, — fu on (—U, 
+U) and fy is continuous at u = 0, then fy extends to a chf. f on R; if 
the extension f 1s unique, then f, —> f on R. 


Proof. According to b, the f, are equicontinuous. Therefore, by 
Ascoli’s theorem, the sequence f, is compact in the sense of uniform 
convergence and, since fa — fy on (—U, +0), all its limit ch.f.’s co- 
incide with fy on (—U, +U). It follows that, if there is only one ch.f. 
f which coincides with fy on (—U, +U), then f, — f on R. 
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The second part of the problem raised above is reduced to its first part: 
find ch.f.’s determined by their values on an interval (—U, +U). A 
partial answer is given by the following theorem (Marcinkiewicz). 


B. EXTENSION THEOREM For cH.F.’s. Jf the restriction fy of a ch. 
f to an interval (—U, +U) ts regular or ts the boundary function of a 
regular function, then fy determines f. 


This theorem follows, by the unicity of analytic continuation, from 
the three propositions below of independent interest. Let f(z) = 


f e'** dF(x), where z = u + iv is a point of the complex plane Ry X R». 


a. f(z) 15 regular in a circle | z | < R if, and only if, for every positive 
r<R, f e!*! d(x) is finite. 


Proof. The “if” assertion is immediate and it suffices to prove the 
“only if” assertion. 
Let 


mo = [x" d(x) and um = [| «|r ares, 
If f(z) is regular for | z| < R, then, for every positive r < R, 


l (n) lyn 
» | m Ir < %, 
and, in particular, 
1 
(2n) ,.2n 
2 (2n)! ee Ss & 


Since 
i 1 
(W—D\RAT = (2M), 


it follows that 


1 
(2n—1),,2n—1 < 
2 Om yi" * 


and, hence, 
1 
fer dF (x) = er < 0, 
n\ 
This proves the assertion. 


b. If f(z) is regular in the circle | z| < R or in the rectangle | az| < U, 
| 3z| < R, then f(z) is regular in the strip | 5z| < R. 


Proof. The first assertion follows at once from a. As for the second 
assertion, let V be the largest number such that /(z) is regular in the 
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circle | 2 | < V and assume that V < R. According to a, f(z) is regu- 
lar in the strip | 5z| < V. But it is also regular in the rectangle | Rz| < 
U, | 5z| < R and, hence, in the circle whose radius equals min (R, 
VU?+ V*). Therefore V cannot be less than R and the proof is 


concluded. 


For every ch.f. f, we have f(z) = f*(z) + f(z) where 


or) 0 
fT@ = f e* dF(x) and f(z) = J et JF (x) 
0 —o 


are regular for 3z >0O and 3z <0, respectively. Therefore, if, say, 
f*(2) is regular for O > 3z > —R, then f(z) is regular for 0 > 3z > —R, so 
that the ch.f. with values f(x) is the boundary function of a regular func- 
tion. Thus, the following proposition completes the proof of the fore- 
going extension theorem. 


c. f(z) is regular for 0 > 32 > —R if, and only if, for every positive 


r< Rf e"” dF (x) is finite. 
0 | 


Proof. The “if” assertion is immediate. As for the “only if” asser- 
tion, we observe that, since f+(z) is regular for 3z > O and continuous for 
5z = 0, regularity for 0 > 3z > —R implies, by a well-known sym- 
metry property, regularity for | 3z| < R and, hence, according to a, 


e™* dF (x) is finite forO <r<R. 


PARTICULAR cases. Upon applying what precedes, we have 
1° Iffn(u) — e™ on (—U, +U), then fr(u) — e™* for every u ER. 


2° If fra(u) > ¢ 2 on (—U, +0), then fr(u) > e 2 for every 
uC R. 

3° If fx > f on (—U, +U) and f is chf. of a rv. bounded etther 
above or below, then f, — f on R. 


d. Uniciry temma. Let g(z) be regular for 32 > 0 and continuous for 
sz 2 0. 
Tf g(z) =f+(2) for z = 0 then g(z) = f*(z) forz 2 0. 

For, A(z) = g(z) —/ft(z) being regular for 3z > O and continuous for 
Sz = 0 with A(z) = O for z = 0 extends, by analytic continuation to an 
entire function vanishing for z = 0 hence vanishing everywhere. 


*15.3 Composition and decomposition of regular ch.f.’s. Let / de- 
note the composed F, * Fo of d.f.’s Fy and Fy. In the case of f or fi, fo 
regular, the composition theorem 13.4A can be.completed as follows: 
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A. CoMPOSITION THEOREM FOR REGULAR CH.F.’S. f(z) ts regular in the 
strip | 3z| < R if, and only if, fi(z) and fo(z) are regular in | dz | < R. 
This theorem follows at once, by 15.2a and b, from the 


ComposITION LEMMA. If Ff = Fy x Fo then, for every 0, 


f e* dF (x) = f e”” dF (x) f e” dFo(x), 


and there exist finite numbers a; > 0, B; 2 0 such that 


fe dF(x) = aye Piel gre dF (~), g = 1,2. 


Proof. We exclude the trivial case of degenerate /; or Fp. The first 
assertion follows, using Fatou’s lemma, in a way similar to that of the 
proof of the composition theorem 13.3A, whether the integrals are finite 
or not. 

As for the second assertion, for every 4, either 


foram ef ot are) = HRB, + 
or . ; 
feranmef er an@ 2 ore, 
according asv 2 Ooryv <0. Let Be be the larger of two finite numbers 
| 4, | and | 2| such that 
a, = F\[d,, +0) >O0 and ag = Fi(be) > 0 


and let ag be the smaller of a4; and ag. Then the inequalities above and 
the first assertion yield 


fi e”* dF (x) = age *l*! f e”* dF, (x) 


and the proof is complete. 


COMPLEMENTS AND DETAILS 


Unless otherwise stated, functions F, with or without affixes, are d.f.’s of 
r.v.’s: F(—«) = 0, F(-+0) = 1, and functions f, with same affixes if any, are 
corresponding ch.f.’s. 

1. If Fis purely discontinuous and the discontinuity set is dense in R, then 
the nondecreasing inverse function is singular. 


2. If Fx,— Fy and wp is any limit point of the sequence u(X,) of medians 
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of the X,, then uw is a median of X. In particular, if uLX) is the unique median 
of X, then u(Xn) — w(X). (Take x’ < uw < x’’ to be continuity points of F, 
then F(x’) < 3.) 

3. P. Lévy’s space. Let § be the space of all d.f.’s F of r.v.’s. Set d(F, F’) 
to be the infimum of all those 4 for which F(x — 4) -AS F(x) S Fe +A +h 
whatever be x €_ R. 

(a) Draw a graph and interpret d(F, F’) geometrically by considering lengths 
of segments intercepted by the graphs of F and F’ on parallels to the second 
bisector. 

(b) The function d so defined is a distance, and (5, d) is a complete metric 
space. 

(c) The following three assertions are equivalent: 


F, > F, d(Fs,F) 0, [gaF, > [gaF 


for every function g continuous and bounded on R. 
(d) A set S in § is compact if, and only if, F(~) — 0 as «x — —o and 
F(x) — lasx — +, uniformly on 8. 
4. Establish the following correspondences for laws. 
Binomial: py = C,¥p*g™—*, k Son, f(u) = (pe™ + 9)”. 
k 


r . 
Poissonian: DP, = zi er,k=0, 1l,---, f(u) = AOP-Y, 


etdu —_ elau 


Uniform: F'(x) = We au . 


> in (a, 4), and 0 outside, /(u) = 


Cauchy: F(x) = 


——_—____ — p—alul+tbu 
aT pi? 2 7% f(u) =e . 
Laplace: F'(x) = 1 7s, a>0O, f(u) = (1 + a2u?)—letou , 
o*u* 
Normal: F'(x) = Ser e- (@—m)"/20", ¢ > 0, me = gimu- FZ,” 
Squared Normal: (m = 0,0 = 1): F(x) = ss e~%/2 for x > 0 and = 0 for 


«30, fo =(1- Diu)” 
T-type: F’(x) = 


-(1-*)" 


5. The composed F of F with the uniform distribution on (—4, +4) is given 
by 


Bag 8 e~* for x > 0, ¢c>0, y >0, 0 for x 80, f(x) 


sin oa 


Pa) =a [FO &, Ko = 


Me) 


An absolutely convergent inversion integral follows: 


1 ezt2n 1 ¢” _ 1 fF? (sin ue)? _ gue “\ 
atl Foo-xf Fo)’ =~ f() er G) a. 


Deduce the continuity theorem. 
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du, h > 0, and let. 


sin? Au 
U2 


6. Let Mf = = J “1 f(o) |? 


Mf = Jim at. “fo |? dv. 


(a) Mf is nondecreasing in 4 and converges to 1 or Mf according as h > © 
ork > 0. lim lim Mi( TI. fi) is either 0 or 1 (identically in 2). 

(b) Mf = >° p,? where the px are jumps of F; Mfife = Mfi-Mh; Mf = 
of (1 — =) dF*(x) where F® is d.f. with ch.f. f¢ = |/|?. (The sum is the 
jump at 0 of £(X) x &(—X) where X is a r.v. with d.f. F.) 

(c) If f, — f with Mf = 0, then Mf, — 0; the converse is not necessarily 
true. i TT fe — f, then MIL fi — Mf. 


7. A law is a “‘lattice’” law if the only possible values are of form a + zs only, 
s>0;2=0, £1, -:°3 if s is the largest possible, then s is the “step” of the 
law. The step is well determined. 

(a) A law is a lattice law if, and only if, | f(uo) | = 1 for an u) 0. The step 
s is given by the property that | f(u) | < Li in0d<|u| < 2/s and f(2r/s) = 

(b) Let pn = PLX = a + ns] where X has a lattice law with step s. then 


S +/s _ insu 
Pn aan feta) da, 


+/s tuzy — tux 
Flea) — Fn) = = ~ fu) du 


—-«/s 
2% sin — 
2 


where x1 = a+ ms —35,x2 =a+t+ns+4s,nZ= m. 
& If the moment m, exists and is finite, then 


log fu) = dS (iu)* + o(u*). 
k= Rk} 
The a, are called semi-invariants; formally 


D Stat = log 7 


n=l e 


Deduce the expression of a few first semi-invariants in terms of moments, and 
conversely. Prove that 
|x| S key, 


(log uF — = oh ij is majorized by 2 Ci — 1)*.) 


9, Tf the derivative F’ on Re exists and is finite, then f(u) > Oas |u| — ©. 
(Use Riemann-Lebesgue lemma.) 
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If the mth derivative F™ on R exists, is finite, and is absolutely integrable, 
then f(u) = o(| u|'—") for |u| — ©. (Integrate by parts.) 

10. Let X be ar.v. with d.f. F. 

(a) If P[| X| 2 x] — 0 as x — © faster than any power of x7}, then all 


moments exist and are finite. (Integrate by parts fl x |" dF(x).) 


A pr. aw is determined by the sequence of moments assumed finite if the 


series ym —~ 4” has a nonnull radius of convergence p. (Use Schwarz’s inequality 
n=0 n\ 
to show that the series with the m, replaced by un majorizes the expansion of 
f about any value of uw, and then use analytic continuation.) 
(b) Formally, by integration by parts, 


0 roe) 
f@ =1—i2 Je Fw) dx + iz J e22(1 — F(x) dx. 


If Pi X|2 x] — Oas x — © faster than e~** for every positive 7 <p, then 
f(z) 1s analytic i in the strip | 3z| <p. If p = = 00, then f(z) is an entire function. 

(c) If e!"F'(x) = ¢ > Oon R for anr < i, then the pr. law is not determined 
by its moments. 


11. If f’ exists and is finite on R, fl *| aF@) may be infinite: take 
f(u) = c¢ x Plog 7 


+a 
> 1/nlogn =~.) Let m’ = _ tim f x dF (x) be the ‘symmetric’’ first 


moment. If m’ exists and is finite, {’(0) may not exist: take a Weierstrass non- 
differentiable function c >* a” cos b%%. 
If the derivative at u = 0 of @f exists, then 


COS 7u 


(The differentiated series converges uniformly but 


i — = o(1) + if.” "¢dF(x), 0<h > 0. 


(Set G(x) = F(x) — Go), H(«) = F(x) + F(—x), so that | AH |< AG. Show 

that Ue 8 dG(x) ~0ash—0O, [, sin OH) a(x) = 9(1).) 
Under the , foregoing condition, /’ exists and is finite if, and only if, 
= im fs x dF (x) exists and is finite, and then f’(0) = im’. Extend to 


any derivative of odd order. What about those of even order? 

72. If g on R is not constant and g(u) = 1 + o(u) + o(u*) near u = 0 with 
o(u) an odd function, then gis notach. f. (Observe that g(u)g(—u) = 1 + o(u?).) 

Examples: e~“, onl” for p > 2,e78%- 1/1 + 4). 

13. Let g on R be real, even and continuous, with g(0) = 1, g(u) — 0 as 
u—> ©, 

If g is convex from below, on [0, +0), then g is a ch.f. (To prove 


f g(u) cos xu du 2 0 for x > 0; observe that on [0, «), say, the left-hand side 
0 
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derivative g’ exists and is nondecreasing, with g’(u) SO and g’(u) — O as 
u— o, Set h = —g’ so that, by integration by parts, 


i du = A(u) sin xu du. 
«| g(u) cos xu du J (u) sin xu du 
For x > 0, the last integral is 
wiz 9 
f {A(u) —h(u +") +a(u+—) - a(u+=2) +. +} sin xu du 2 0) 
0 x x 


Examples: e~™, 1/(1 + | « |), 1.—|«| for|«|S 1 and0 for|u|> 1. 
14, (a) Two ch. rs may coincide on intervals without being identical. 
— cos x 


Take F’)(x) = oe hence fi(u) = 1—|u| for |u| S1 and O for 


2 . . 
|u| > 1, and take F defined by po = 4, Par(2k+1) = Ok PE 1)? : fo(u) is peri- 


odic of period two and coincides with /; on [—1, +1]. Or, take f to be a ch.f. of 
the type described in 7/3 with /’ continuous and strictly increasing on [0, ©). Re- 
place two arbitrarily small arcs of the graph of f which are symmetric with 
respect to the y-axis by their chords, and compare the function so defined with /. 

(b) The compositions of a law with either one of two distinct laws may coin- 
cide (fift = uf). 

(c) If fn — fon [—U, +U], the same may not be true on R. 

15. f on R is a ch.f. if, and only if, there exists a sequence g, such that 


f gn(v) |?>dv—->1 and fenlu + v)Fr(v) dv — f(u) uniformly in every finite 
interval. 
(For the “if” assertion, observe that every integral is positive-definite. For 


the ‘only if” assertion, divide [—, +-] into n® equal subintervals, set F,(—7) 
= 0, F,(m) = 1, F.a = F at the subdivision points, and linear inside every sub- 


interval; set ¢ngn(u) ={- "V F'ale) (x) e* dx with g,(0) = 1. Compute /, and 


observe that fn — f/f.) ” 
16. Let g and 4 be bounded and continuous on R, with #(u) = g(—u), and 
let A(z) be an arbitrary finite function on R. 
If for every finite set 4 of values of u 
1, Syke — MRO) | STD He — MRO), 
then 


h(u) = f ciuz dH (x) 


where H is a d.f. up to a multiplicative constant. 
The foregoing inequalities represent a necessary and sufficient condition for 
g to be of the form 


gu) = fe dG) 


with |AG| S AH. Find the relation between discontinuity and continuity 
points of G and H. 
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17. The uniqueness and composition properties determine “essentially” the 
form of ch.f.’s. Let K on R X R be bounded and continuous. If the functions 


g on R are defined by g(u) = f K(u, x) dF(x) for every d.f. F, and the unique- 


ness and composition properties hold, then K(u, x) = e”"™ and f(h(u)) = g(u). 
18. Normal vectors. A normal vector X = (Xz, k S n) is so defined that all 


r.v.’s of the form >) u,X, are normal. Let the X; be centered at expectations. 
k 
A ch.f. f on R® is that of a normal vector (centered at its expectation) if, and 
only if, 
log f(t, mr y Un) = Qu, me ey Un) = —3 ye MjKRUjUE = 0 
7 
where Nik = EX;Xx. 
If the inequality is strict, then the normal d.f. is defined by 
Qo” 1 
— F(x, +++ ¥,) = ———_—e- 
On —_ Orn (x1, >” ) (2m) "/2D 4° 


$0(21, .e *, In) 


1 . . 
where D = || my, || > 0 and g(«1, +++, a) = D >> Djxxjxx is the reciprocal form 
ik 
of Q(u1, «++, un) with the variables x,. What if Q 2 0? 


19. If (X, Y) is a normal pair centered at expectations, then EXY/oXo0Y = 
cos pr where p = P[XY < 0]. (Compute P[XY < 0] using the d.f.) 


Part Three 


INDEPENDENCE 


Until very recently, probability theory could have been defined to 
be the investigation of the concept of independence. This concept con- 
tinues to provide new problems. Also it has originated and continues 
to originate most of the problems where independence is not assumed. 

The main model is that of sequences of sums of independent random 
variables. The main problems are the Strong Central Limit Problem 
and the (Laws) Central Limit Problem. The first is concerned with al- 
most sure convergence and stability properties. The second one is 
concerned with convergence of laws. All general results were obtained 
since 1900. 
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Chapter 


SUMS OF INDEPENDENT RANDOM VARIABLES 


Two properties play a basic role in the study of independent r.v.’s: 
the Borel zero-one law and the multiplication theorem for expectations. 
Two general a.s. limit problems for sums of independent r.v.’s have been 
investigated: the a.s. convergence problem and the a.s. stability prob- 
lem. Both of them took their present form in the second quarter of 
this century.. 


$16. CONCEPT OF INDEPENDENCE 


ConvENTION. To avoid endless repetitions, we make the convention 
that, unless otherwise stated, 


—r.v.’s, random vectors and, in general, random functions are de- 
fined on a fixed but otherwise arbitrary pr. space (Q, @, P). 

—indices ¢ vary on a fixed but otherwise arbitrary index set 7, and 
events of a class have the index of the class. 


16.1 Independent classes and independent functions. Events 4; are 
said to be independent if, for every finite subset (¢), ---, fn), 


(I) P f) Ay = II PAy,. 
k=1 k=1 
In fact, the concept of independence is relative to families of classes 
(see Application 1° below). 


Classes ©; of events are said to be independent if their events are inde- 
pendent; in other words, if events selected arbitrarily one from each 
class are independent. Clearly, if the @; are independent so are the 
eC’ C Cy, t’ € T’'C T. Because of its constant use, we state this fact 


as a theorem. 
235 
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A. Subclasses of independent classes are independent. 


Let X; be r.v.’s or random vectors or, in general, random functions. 
Let ®(X;) be the sub o-field of events induced by X;, that is, the in- 
verse image under X; of the Borel field in the range space of X;. 

The X; are said to be independent if they induce independent o-fields 
@(X;). Then classes ®; C @(X;) are independent. Since a Borel 
function of X; induces a sub o-field ®; of events contained in @(X;), it 
follows that 


A’. BoREL FUNCTIONS THEOREM. Borel functions of independent ran- 
dom functions are independent. 


Independent classes can be enlarged, to some extent, without de- 
stroying independence. More precisely 

Let C, be independent classes. Independence is preserved if to every 
C; we adjoin 

1° the null and the a.s. events, for (I) is trivially true—both sides 
reducing to 0O—when at least one of the events which figure in it is 
null, while (I) with 7 indices reduces to (I) with fewer indices when at 
least one of the events which figure in it is a.s.; 

2° the proper differences of its elements and, in particular, their com- 
plements (because of 1°), for if 4,, D 4’s,, then 


P(A, — A'1,)A +++ 4, = PAA -+: 4, — PA nA, +++ A 
= (PA, — PA',) PA, +++ PA, 
= P(A, — A',)PA, +++ PA; 
3° the countable sums of its elements, for 


Pd? Ai) Arg +A =D PAJA “++ Ay, 
j j 


1) 


= (Yo PAJ)PA, ++» PAs, 
J 
= P(X: A,))PA, eee PA, 
J 


4° the limits of sequences of its elements, for if 4," —> 4,,asm — ©, 
then 


PAA, cee Ay, <_— PA," Ar cee A, 
= PA,"PA,, see PA, —_ PAPA vee PA, 


Tt follows easily that 


[SEc. 16] SUMS OF INDEPENDENT RANDOM VARIABLES 237 


B. ExTENSION THEOREM. Minimal o-fields over independent classes 
C, closed under finite intersections are independent. 


Applications. 1° If the events 4; are independent, so are the o- 
fields (4;, 42°, @, Q). 

2° If the inverse images @; of the classes of all intervals (—, x;) 
in Borel spaces R; are independent, so are the inverse images @®; of the 
Borel fields in the R;. For, every @; is closed under finite intersections 
and ®; 1s the minimal o-field over @;. 

*3° Let ®, be o-fields (or fields) of events and let JT, be a subset of 
the index set T. The compound o-field ®7, with components @,, ¢ € T,, 
is the minimal o-field over the class Cr, of all finite intersections of 
events 4,, ¢ € T;, and contains all its components; since the ®; are 
closed under finite intersections so is Cz,. @y, is a compound sub o-field 
of ®r and, if T; 1s finite, then @z, is a “finitely compound” sub o-field. 


If compound o-fields are independent, then, by A, their finitely com- 
pound sub o-fields are independent. Conversely, if the finitely com- 
pound sub o-fields are independent, then, by the extension theorem, the 
compound o-fields are independent. We state these facts as a theorem. 


C. ComMpouNDS THEOREM. Compound o-fields are independent if, and 
only if, their finitely compound sub o-fields are independent. 

In particular, if the ®: are independent, so are the @7, for every 
partition of T into set Ts. 

Families Xp, = {Xi, ¢ € Ts} of r.v.’s induce sub o-fields @(X7,) of 
events. Every ®(X7,) is the minimal o-field over the class C(X7,) of 
inverse images of all intervals in the range space R7, of X7,. But the 
intervals in the Borel space Rr, are products of intervals in the factor 
spaces R;, with only a finite number of factor intervals different from 
the whole factor spaces, and the inverse image of any factor space is Q. 
Therefore the elements of C(X7,) are all the finite intersections of ele- 
ments of the ®(X;). It follows that the o-field @(X7,) is a compound 
of the o-fields ®(X;), and theorem C becomes 


C’. FAMILIES THEOREM. Families of random variables are indepen- 
dent if, and only if, their finite subfamilies are mutually independent. 


Thus, in the last analysis, independence of random functions reduces 
to independence of random vectors. 


To conclude this investigation of the definition of independence, let 
us observe that all which precedes applies to complex r.v.’s, to com- 
plex random vectors, and, in general, to complex random functions. 
X, = X',+ 1X", considered as vector random functions (X",, X”:), 
te T. 
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16.2 Multiplication properties. The direct definition of independent 
r.v.’s is as follows: 

Random variables X;, ¢ C T, are independent if, for every finite class 
(Sis °° +, S,,) of Borel sets in R, 


P a Xi, € vA = II P[Xy, € AYA 
k=1 k=1 
The basic expectation property of independent r.v.’s is expressed by 


a. MULTIPLICATION LEMMA. If Xy, +--+, Xn are independent non- 
n n 


negative r.v.’s, then EJ] X, = [[ EX;y. 
k=1 k=1 


Proof. It suffices to prove the assertion for two independent r.v.’s 
X and Y, for then the general case follows by induction. First, let X = 
> *jl4, and Y= >) yelp, be nonnegative simple (or elementary) 
j k 


r.v.’'s; we can always take the x;, and, similarly, the y;z, to be all dis- 
tinct, so that 4; = [X = x,], B, = [Y = yz]. Since X and Y are inde- 
pendent, P4;B;, = PA;PB, and, hence, 


EXY = ¥x;y,PA;PB, = x x;PA; - x y,P By = EXEY. 


ik 


Now, let X and Y be nonnegative r.v.’s and set 


Since X and Y are independent so are these events and, hence, so are 
the simple r.v.’s 


nar 4 : n2" b _ 1 
Xn =D Te Yn = » —— LB 
k=1 2 


But 0S X,7X,08S Y,7 Y, so that OS X,Y, 7 XY and, by what 
precedes, E.X,Yn = EX,EY,. Therefore, by the monotone conver- 
gence theorem, EXY = EXEY, and the lemma is proved. 


A. MULTIPLICATION THEOREM. Let Xj, +++, Xn be independent r.u.’S. 
If these r.v.’s are integrable so is their product, and E 1 XE = I EX, 


Conversely, if their product is integrable and none ts degenerate a 0, then 
they are integrable. 
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Proof. It suffices to prove the assertion for two independent r.v.’s 
X and Y. We observe that independence of X and Y implies that the 
nonnegative r.v.’s X’ = X+ or X~ or | X| and Y’ = Y* or Y~ or 
| Y | are independent, so that, by a, EX’Y’ = EX’EY’. Now, if X 
and Y are integrable so are X’ and Y’ and, by the foregoing equality, 
so is X’Y’. Therefore | XY| and hence XY are integrable and, by the 


same equality, 
EXY = E(X*t — X-)\(Yt - Y-) 
= EXTEYt — EXtEY~ — EX~EY* + EX-EY— 
= EXEY. 


Conversely, if XY is integrable so that Z| X|E| Y| = E| XY| < o, 
and neither X nor Y degenerates at 0 so that Z| X| and E| Y| do not 
vanish, then Z| X| and E| Y| are finite, and the proof is concluded. 
Extension. The multiplication theorem remains valid for independ- 
ent complex r.v.’s X, = XX’, + 71X";, since it applies to every term of 


the expansion of [J (X", +72X”,). In particular, according to the 
k=1 


Borel functions theorem, if the X; are independent so are the e“*# 


and, hence, 


nr 
Ee *")) = E\j e** =] Ee™** 
k= k=1 
In other words, 
Coro.iary. Ch,f.’s of sums of independent r.v.’s are products of chf.’s 
of the summands. 


This proposition, to be used extensively in the following chapter, is but 
a special case of a property which can serve as an equivalent definition 
of independent r.v.’s, as follows: 

Let F; and fi, F,,...., and fj,...,, be the d.f.’s and ch.f.’s of the r.v. 
X, and of the random vector (X;, ...X;z,), respectively. 


B. EQUIVALENCE THEOREM. The three following definitions of inde- 
pendence of the r.v.’s Xz are equivalent. 
For every finite class of Borel sets S; and of points x1, u;€ R 


(I,) P () [Xn € St, = II P[Xi, € Stl 
k=1 k=1 
(Ie) Festa Su tty Mi) = Fi,(xt) aan F(X) 


(Is) St: : tn Uys mt y U1,) = F(t) °° * Fin Utn) 
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Proof. (I,) implies (Ig) by taking S; = (—, x). Conversely, (Iz) 
implies (I,) with S; = (—, *;) and, on account of 16.1, Application 2°, 
this implies (I,) for all S;. 

(Iz) implies (I3), for (Ie) implies (11) which implies (I3) exactly as 
the multiplication theorem implies its corollary. Conversely, (Is) im- 
plies (Iz), for the inversion formula for one- and multi-dimensional 
ch.f.’s shows at once that if (Ig) is true, then, for all continuity intervals, 


Py. tl Btys a) Qtn3 bi m8 ey b,,) = Filan, b:,) “ee Plains b1,)s 


and (Ie) follows by letting the 4, -~ — and dT x, This completes 
the proof. 

Extension. The equivalence theorem is valid when the X; are ran- 
dom vectors, for the proof applies word by word, provided R is replaced 
by the range space R; of X;. 

16.3 Sequences of independent r.v.’s. At the root of known a.s. 
limit properties of sequences of independent r.v.’s lies the celebrated 


A. BoREL ZERO-ONE CRITERION. Jf the events A, are independent, 
then P(lim sup dn) = O or 1 according as >) PAn < © or = ~, 


Proof. Since 
P(lim sup 4p) = lim» lim, P U 4, = lima lim, (1 — P f) 4°) 
k k=m 


=m 


and the events 4, and hence 4,° are independent, the assertion fol- 
lows by passing to the limit in the elementary inequality 


1—exp[— }} P4| S$1—-—JIL 0d — PA) S PA: 
k=m k=m k=m 
Since, whatever be the events 4,, >> P4n < © implies that 
limm lim, P U 4, S limm lim, >> PA, = 0, 


k=m k=m 


the ‘‘zero” part of this criterion is valid with no assumption of inde- 
pendence: 


a. Boret-CaNTELLI LEMMA. If >) Pd, < ~, then P(lim sup 4,) = 0. 


Coro.iary 1. Jf the events Ayn are independent and A, — A, then 
PA =0Oorl. 


Corotuary 2. Jf the rv.’s Xn are independent and Xp —; O, then 
> P| Xn | > ¢] < © whatever be the finite number c > 0. 
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For X, —> 0 implies that, if 4, = [| X,| Zc], then P(lim sup 4,) 
= 0, and independence of the X, implies that of the An. 


Because of its intuitive appeal, instead of “lim sup 4,” we shall 
sometimes write “4, 1.0.’; to be read “4,’s occur infinitely often” or 
“infinitely many 4, occur.” This terminology corresponds to the fact 
that lim sup 4, is the set of all those elementary events which belong 
to infinitely many 4, or, equivalently, to some of “the 4p, 4n41, °°° 
however large be ”’’—the “‘tail” of the sequence 4,. To the “tail” of 
the sequence 4, of events corresponds the “tail” of the sequence Jy, 
of their indicators. More generally, the “tail” of a sequence X, of 
r.v.’s is “the sequence Xn, Xn41, °** however large be 7.” 

To be precise, let X1, Xo, --- be a sequence of r.v.’s and let (Xn), 
B(Xn, Xn41)s a) B(Xn, Xn41) ree), B(Xn+1; Xn425 see), -++ be 
sub a-fields of events induced by the random functions within the brack- 
ets. We give a precise meaning to lim sup ®(X,), as follows: The se- 
quence @(Xn), B(Xn, Xn41)) °** 18 a nondecreasing sequence of o- 
fields, its supremum or union is a field, and the minimal o-field over 
this field is @®(Xn, Xn41, °*:) or, writing loosely, “sup B(Xm).” In 

m=n 


turn, the sequence ®(Xn, Xn4iy °°*)) B(Xn4i1y Xn4ay -°')y «°° 18 a 
nonincreasing sequence of o-fields and its limit or intersection is a o- 
field © contained in @(Xn, Xn41, *°*) however large be 7 or, writing 
loosely, “lim sup ®(X,).” The o-field © will be called the tat! ofteld 
of the sequence X, or “the sub o-field of events induced by the tail of 
the sequence X,.” Let us observe that all the foregoing o-fields and, 
in particular, the tail o-field, are contained in the o-field ®(X1, Xa, ---) 
induced by the whole sequence X,. The elements of the tail o-field © 
are fail events and the numerical (finite or not) C-measurable functions, 
that is, those functions which induce sub o-fields of events contained in C 
are tail functions—they are defined on the “tail” of the sequence. For 
example, the limits inferior and superior of the sequence X, and of the 
sequence (X; + X_ +-+:+ Xn)/bn, where d, — ©, are tail functions 
(not necessarily finite), while the sets of convergence of these sequences, 
as well as the set of convergence of the series >, Xn, are tail events. 
To Borel’s result corresponds the basic Kolmogorov’s 


B. ZERO-UNE LAW. On a sequence of independent r.v.’s, the tail events 
have for pr. either 0 or 1 and the tail functions are degenerate. 


In other words, the tail o-field of a sequence of independent r.v.’s is 
equivalent to {9, Q}. 


242 SUMS OF INDEPENDENT RANDOM VARIABLES [Sec. 16] 


Proof. We observe that an event 4 is independent of itself if, and 
only if, PAA = PA-PA, that is, if P4 = 0 or 1—and such events are 
mutually independent. Thus, the first assertion means that the tail 
g-field © of the sequence X, of independent r.v.’s is independent of it- 
self. Since © C @(Xn41, Xn42, -**) whatever be ” and, because of 
the independence assumption, ®(X4, «++, Xn) is independent of B(Xn41, 
Xn+2, °**), it follows that © is independent of ®(X1, Xe, ---, Xn) 
whatever be ~. Therefore, © is independent of ®(.X1, Xo, ---) and, 
being contained in @(X1, Xe, ---), it 1s independent of itself. This 
proves the first assertion and the second follows, since, if X is a tail 
function, then it is a.s. {@, 2}-measurable hence degenerates. 


Corotiary. Jf Xn are independent r.v.’s, then the sequence Xy etther 
converges a.s. or diverges a.s.; and similarly for the series >) Xn. More- 
over, the limits of the sequences X, and (Xy +-+-+ Xn)/bn where bn T 2, 
are degenerate. 


*16.4 Independent r.v.’s and product spaces. Let X;, where ¢ runs 
over an index set JT, be independent r.v.’s with d.f.’s Fx, on R;. Be- 
cause of the correspondence theorem, every Fx, determines a pr. Py, 
on the Borel field @; in R;. On account of the product-measure theo- 
rem, the Px, determine a product-measure [J Px, on the product Borel 
field [J ®; in the product space [[ R;. On the other hand, the law of 
the family X = {X, ¢€ T}, represented by the family of d.f.’s 
{Fx,,...x1,} Of all finite subfamilies of X determines, by the corre- 


spondence theorem, a family {Py,,...,x,,} Of consistent measures on 


N 
the product Borel fields [J ®,,. Owing to the consistent measures theo- 
k=1 


rem, this family of pr.’s determines a pr. Px on [J @:. 
Since the X; are independent, 
Px, + -Xty = Fy,, rn x Fx, 
so that 
Px, -Xty = Px, x me x Px, 


and, therefore, Px coincides with [] Px,. In other words, 


A. The pr. space induced on its range space by a family of independent 
r.u.s is the product of pr. spaces induced on their respective range spaces 
by the r.v.’s of the family. 


Let us observe that this reduces the multiplication theorem to the 
Fubini theorem. 
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The question arises whether the converse is true: Given a product 
pr. space ([] R:, [I @:, [] P:), is there a family {X1, ¢ C T} of inde- 
pendent r.v.’s on some pr. space (Q, @, P) which induces this product 
pr. space? Equivalently, given a family {F,,¢€ T} of d.f.’s with 
variation 1, is there a family {X:, ¢ © JT} of independent r.v.’s with 
Fy, = Fi? 

If the pr. space on which the r.v.’s have to be defined is fixed, then, 
in general, the answer is in the negative, since on a fixed pr. space even 
one r.v. with a given d.f. might not exist. However, if we are at lib- 
erty to select the pr. space on which to define r.v.’s, and we shall always 
do so, then the answer is in the affirmative, as follows: 

Let the pr. space be the product pr. space ([[ R:, [J @:, [] P:) where, 
if the F; are given, the P; are determined upon applying the correspond- 
ence theorem. The r.v.’s X;, defined on this pr. space by X;(~) = *1, 
x = {x,,¢€ T}, are then independent, since their pr.d.’s are P; and 
their d.f.’s are F;. Thus 


B. The relation Xi(x) = «1% = {x1,¢ © T} establishes a one-to-one cor- 
respondence between families {X,} of independent r.v.’s and product pr. 
spaces on [J R¢. 


REMARK. There exist pr. spaces on which can be defined all possible 
families of independent r.v.’s with a given index set JT. For example, 
take the pr. space (Q, @, P) where Q = [[ Q; with Q, = (0, 1) and 
P = [J P, on the Borel field @ in Q, with P; being the Lebesgue measure 
on the Borel field in Q; (class of Borel sets in Q;). Then the r.v.’s X;— 
inverse functions of arbitrarily given d.f.’s /;—are independent and 
Fx, = F,. 

Extension. The preceding considerations apply, word for word, to 
random vectors. They also apply to arbitrary random functions, pro- 
vided we consider that the d.f. of a random function is defined in terms 
of its “‘finite sections,” that is, the family of d.f.’s of projections of the 
random function on finite subspaces. 


§ 17. CONVERGENCE AND STABILITY OF SUMS; CENTERING AT 
EXPECTATIONS AND TRUNCATION 


This section and the following one are devoted to the investigation 
n 

of sums S, = > X; of independent r.v.’s X1, Xo, ++: and, especially, 
k=1 


of their limit properties—convergence to r.v.’s and stability. 
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Given two numerical sequences 4, and 4, T ©, we say that the se- 

. . On P Sn 

quence S,, is sfadle in pr. or a.s. if — a, — Vor 3. 

fact, a stability property is at the root of the whole development of pr. 

theory. If X,, Xo, --- are independent and identically distributed 1n- 

dicators with PLX, = 1] = p and P[X, = 0] = ¢ =1-— >, we have 
the Bernoulli case. The first stability property is the 


a.s. 
—a, — 0. In 


, Sn 
BERNOULLI LAW OF LARGE NUMBERS: In the Bernoulli case — — p 
n 


P 

— 0. 

The Central Limit Problem, to which the following chapter is devoted, 
is the direct descendant of its sharpening by de Moivre and by Laplace. 
On the other hand, the following strengthening 


BOREL STRONG LAW OF LARGE NUMBERS: /n the Bernoulli case 


n as. 
—_——-p— 0, 
nN 


is at the origin of the results given in this chapter. Perhaps the im- 
portance of the methods overwhelms that of the results and emphasis 
will be laid upon the methods. These methods are (1) centering at ex- 
pectations and truncation and (2) centering at medians and symmetri- 
zation. 

17.1 Centering at expectations and truncation. We say that we cen- 
ter X at c if we replace X by X —c. If X is integrable, then we can 
center it at its expectation EX and, thus, X is replaced by X — EX. 
In other words, a r.v. is centered at its expectation if, and only if, its ex- 
pectation exists and equals 0. 

Let X be integrable. The second moment of X — EX is called 
variance of X; it exists but may be infinite and will be denoted by o7X. 


Th 
ms eX = E(X — EX)? = EX? — (EX)?. 
Since, for every finite c, we have 
o(X —c) = E(X —c— E(X — 0))* = E(X — EX)’, 


centerings do not modify variances. 

The importance of variances is due to the fact that we have at our 
disposal bounds, in terms of variances of summands, of pr.’s of events 
defined in terms of sums S,, of independent r.v.’s; we shall find and use 
such bounds in this section. However, variances can be introduced 
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only when the summands are integrable. Moreover, the bounds men- 
tioned above are nontrivial only when the variances are finite. This 
seems to limit the use of such bounds to square-integrable summands. 
Yet this obstacle can be overcome by means of the truncation method. 

We truncate X at c > O (finite) when we replace X by X° = X or 0 
according as | X| <c¢ or |X| 2c, and X° is X truncated at c. It fol- 
lows that, if Fis the d.f. of X, then all moments of X° 


EX® =| x dF, E(X°)? =| x? dF, etc., 
{| x|<c |z|<ce 


exist and are finite. We can always select ¢ sufficiently large so as to 
make P[LX ¥ X‘] = P|| Xx | = c] arbitrarily small. Furthermore, we 
can always select the c; sufficiently large so as to make P U [X; # X;"] 
arbitrarily small, since, given e > 0, we have 


PUIX; ¥ Xf] S$ DPI Xi | = al<e 


€ 
if, say, the c; are selected so as to make P| X; | = cj] < a Thus, to 


every countable family of r.v.’s we can make correspond a family of 
bounded r.v.’s which differs from the first on an event of arbitrarily 
small pr. Moreover, if we are interested primarily in limit properties 
there is no need for arbitrarily small pr., for the following reasons. 

Let two sequences X, and X’, of r.v.’s be called ¢atlequivalent if 
they differ a.s. only by a finite number of terms; in other words, if for a.e. 
w € Q there exists a finite number ”(w) such that for 7 = n(w) the two 
sequences X,(w) and X’,(w) are the same; in symbols PLX, ¥ X’, 1.0.] 
=(Q. If the sequences X, and X’, only converge on the same event 
up to some null subset, then we say that they are convergence equivalent. 

n 


Let 5S, = >> X, and 8S’, = >> Xx. Since 
k=1 k=l 


PX, ¥ X'n io] = lim, PU [X%, ¥ X4)S lim, © PLY, ¥ X%] 
k=n k=n 


it follows that 


a. EQUIVALENCE LEMMA. Jf the series >) P[Xn ¥ X'n] converges, 
then the sequences Xn and X', are tatl-equivalent and, hence, the series 
, S S 
>, Xn and >. X'n are convergence-equivalent and the sequences ” and 7 ) 
n nr 
where b, | ©, converge on the same event and to the same limit, excluding a 
null event. 
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17.2 Bounds in terms of variances. To avoid repetitions, we make 
n 
the convention that, unless otherwise stated, Sp = 0, Sy = >> X:, 
k=l 


n = 1,2, ---, and the summands Xj, Xo, --- are independent r.v.’s. 

Let X1, Xe, ---, be integrable. Since centerings do not modify the 
variances, we can assume, when computing variances, that these r.v.’s 
are centered at expectations. Then 


n 


oS, = ES,? = > EX, + > EX;X, = > 07? Xi, 
k=1 j,k=1 k=1 
jFAk 


since independence of X; and X; entails, by 15.2, 
EX;X, = EX;:-EX, = 0. 
Thus, we obtain the classical 


BIENAYME EQUALITY. Jf the r.v.’s Xy are independent and integrable, 
then 
nr 
aS n = », a7 Xp. 


k=l 
The basic inequalities 9:3A become 


nN 
> o*X, — 


k=1 


a. —_—__———._ = P||S, — ES, | 
a.s. sup (S, — ES,) 


IV 


1 nr 
e] S s > 07 Xq. 
€ 


The right-hand side inequality is the celebrated BiENayME-TCHEBICHEV 
INEQUALITY. Applied to (Snax — ESn4n) — (Sn — ES,) and to S, — ES, 
with ¢ replaced by edn, it yields, by passage to the limit, 


b. Jf the series 2 g Xn converges, then the series >. (Xn — EXn) con- 
Sn — ES, p 
verges in pr. I~ — > o*X;, — 0, then — 0 
b,? k=1 n 

This last property is due to Tchebichev (when 4, = 2). In the Ber- 
noulli case, where 4, = n, EX, = p, o?Xn = pq, it reduces to the Ber- 
noulli law of large numbers. It is of some interest to observe that 
Borel’s strengthening can also be obtained by means of the Bienaymé- 
Tchebichev inequality (see Introductory part). 


So far, the assumption of independence was used only to establish 
that the summands were orthogonal, that is, EX;X, = O(j ¥ k) when 
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X; and X; are centered at expectations. In fact, the foregoing results 
remain valid under the even less restrictive assumption of orthogonality 
of Sy, and X,, x = 1, 2, -+-, since, then, 


o7Sn = O'Sn—1 + 0? Xny 


and the Bienaymé equality follows by induction. 

But, in the case of independence, the r.v.’s Sn_yl4,_, and Xp are 
orthogonal, not only for 4,_; = Q but also for every event 4n_; de- 
fined in terms of X,, X2, ---; Xn—1- Therefore, it is to be expected 
that the foregoing results can be strengthened by using more completely 
the properties of independence, in particular the orthogonality prop- 
erty just mentioned. 


A. KoLMocorov INEQUALITIES. Jf the independent r.v.’s X_ are inte- 
grable and the | Ak | < c¢ finite or not, then, for every « > 0, 


2c)? 1 2 
— et < Plmax| S, — ES,| = ds <> Do Xx 
> 2X; Sn € k=1 
k=l 


If one of the variances is infinite, then the right-hand side inequality 
is trivial and the left-hand side inequality has no content (for, then, 
c = 0), so that we assume that all variances are finite. In that case, 
the left-hand side inequality is trivial when c¢ is infinite and therefore 
we assume, in proving this inequality, that, moreover, c is finite. 

Proof. We can assume, without restricting the generality, that the 
Xn, and hence the S,, are centered at expectations, provided we note that 
| X| Sc implies | EX | S c and, hence, | X — EX| S 2c. 

Let 

Ay = [max | S;| < d], 
jSk 


By = Aya — Ay = [| 81 | <e--+y| Spi | <e| 5] = ] 

so that , 
Ay =, Ano = Bry BeC [| Stal <6| Sz] 2 el. 

k=1 


1° Since S,Jz, and S, — S, are orthogonal, it follows that 


f Sy? = E(S,Ip,)* 
Bi 
= E(S,Ip,)? + E(Sn — Si)Ip,)? = E(SpIp,)? = PPB. 
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Summing over k = 1, ---, ”, we obtain 
> o*X, = ES,” =| Sn? -= a 2>¢ > PB, = &PA,°, 
k=1 k=1 


and the right-hand side inequality 1s proved. 
2° Since 


Spotl ay» + Xelayy = StLap_. = Sela, + Sela, 
and S,_1l/4,_, and X; are orthogonal while J4,Jp, = 0, it follows that 
E(Sp—1L 4,4)? + 0? X_:-PApa = E(SiLa,)* + E(S:Tz,)?. 
Since P4,_1 2 PAn and | Xk | < < 2c, and hence 
| SL, | < | 5,11, | + | XiZp, | S (e + 2c)Ia, 
it follows that 
E(Sp—1L ay)” + 0° Xp:PAn S E(SyLa,)? + (€ + 2c)?PBy. 
Summing over k = 1, ---, ”, we obtain 


o> a7 X1)PAn Ss E(8,I4,)" + (e + 2c)? »> PB, 
k=1 


k=1 


IA 


&PAn + (e+ 2c)?PAn® S (e + 2c)?, 


and the left-hand side inequality follows. 

17.3 Convergence and stability. We apply now Kolmogorov in- 
equalities and the truncation method to convergence and stability prob- 
lems for consecutive sums 5S, of independent r.v.’s Xj, Xo, --° 


I. ConvERGENCE. In this Chapter, convergence means convergence 
to a finite number or to a finite function (r.v.). 


a. If >> 0? X;, converges, then >, (Xn — EXn) converges a.s. If > 0? Xp 
diverges and the Xy, are uniformly bounded, then >) (Xn — EXn) diverges 
a.s. Thus, if the Xn are uniformly bounded, then >) (Xn — EX,) con- 
verges a.s. if, and only if, >. 0? Xn converges. 


This follows, by letting m, 2 — © in Kolmogorov’s inequalities with 
Sy replaced by Sm4x% — Sm- 


b. If the X, are uniformly bounded and >) Xn converges a.s., then 
>, 0X, and >. EX, converge. 
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Proof. To the r.v.’s Xn we associate r.v.’s X’, such that X, and X’, 
are identically distributed for every » and Xy, X’1, Xo, Xo, -+: is a 
sequence of independent r.v.’s. We form the “symmetrized” sequence 
Xn* = Xn — X'n of independent r.v.’s, and have 


| X,°| S| Xn| +] X%n| S 2c, EX,’ = EX, — EX’, =0, 
a7 Xn® = 07? Xn + 07 Xn = 207Xp. 


Since >> X, converges a.s., so does >, X’, and hence >> X,° (= >> Xn 
— > X’,). It follows, by a, that >> o7X,° and hence >> o?X, converge 
and, again by a, >) (X, — EX,) converges a.s., so that >) EX, = 
> Xn — > (Xn — EXn) converges. The assertion is proved. 


Let X° be X truncated at (a finite) c > 0. We have Kolmogorov’s 


A. THREE-SERIES CRITERION. The series >) Xn of independent sum- 
mands converges a.s. to ar.v. if, and only tf, for a fixed c > 0, the three 
SEr1ES 


(i) DPX) 2, Gi) Ler, (iii) D EX, 


converge. 


Proof. Convergence of (1) entails, by the equivalence lemma, con- 
vergence-equivalence of >) X, and >) X,°, and convergence of (11) and 
(ii1) entails, by a, a.s. convergence of >) X,°. This proves the “if” 
assertion. - 

Conversely, let > X, converge a.s. so that X, > 0. By16.3A,Cor. 2, 
(1) converges, so that, by the equivalence lemma, >) X,° converges 
a.s. and, by b, (i1) and (i111) converge. This proves the “only if” asser- 
tion. 


Coroutuary. If at least one of the three series in A does not converge, 
then >) Xy diverges a.s. 


For, by 16.3B (Corollary), >) X, either converges a.s. or diverges a.s. 


Remark. In the proof of b we introduced a “symmetrized” sequence. 
This is an application of the “symmetrization method,” to be expounded 
in the next section. 


ee . S a.s. 
II. A.s. srasitiry. We seek conditions under which 3. — a, — 0 
n 


when J, T ©, and require the following elementary proposition. 
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ToEPLitz LEMMA. Let Ganz, kK = 1, 2, +--+, kn, be numbers such that, 
or every fixed ky an, —> 0 and, for all n, >>| anzk| Sc < ©; let x’, = 
y ) . 


2, Ank*Vk- 


Then, Xn — 0 entails x', > 0 and, if >) any — 1, then x, > * 
k 
nr 
finite entails x’, — *. In particular, if by = 2 a, [©, then xn > * 


a 
jinite entails — YY) ayxy — x. 
On k=1 


The proof is immediate. If x, — 0 then, for a given e >O and 


: € 
nZn, sufficiently large, | x | <-so that 
C 


| ’n| S| anexe| +. 
k<n, 


Letting 7 — © and then e — 0, it follows that x’, — 0. The second 
assertion follows, since then 


eS = 2 (Ank)* si 2 Ank (Xk cae x) 7 %. 


e ak e ° 
And setting an, = —» k <n, the particular case 1s proved. 
nr 


The particular case yields the powerful 


KRONECKER LEMMA. Jf >) Xn converges to s finite and b, |, then 
i n 
— DyXh — 0. 
n k=1 
n 
For, setting 49 = 0, a, = bd, — Fe—1y Sng = Do ky we have 
k=1 
] n n ] n 
— bet, = — Lo On Ska — 5h) = Snta — — Danse AS — 5 = 0. 


We are now in a position to prove Kolmogorov’s proposition below. 


aX, 


A. If the integrable r.v.s Xn are independent, then >) Fi — < 0, 


Sn vee ES a.s. 
a eG 


bn 1 ©, entails 
nv 
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For, by Ia, convergence of >> ~ entails a.s. convergence of 
Xn _ EXn 
bn, 


We can now prove an extension of Borel’s strong law of large numbers. 


bn 7 


, and the Kronecker lemma applies. 


B. KoLMOGOROV STRONG LAW OF LARGE NUMBERS. [If the independ- 
ent r.v.’s Xn are identically distributed with a common law &(X), then 
Xy tes: + Xn as. i, — 

——_—__— —> c finite if, and only if, E| X| < ~; and thenc = EX. 
n 


Proof. Weset 4, = | xX | =n], do = Q, and observe that, for every 
n, PA, = P{| Xn| = x], while 


~ PAn = (a — 1)(PAna1 — PAn) S$ CSE] X|La,.- 4, 
Ss » n(PAn-1 — PAn) $1+ DPA, 


or 


~ PA, S$ E| X| S14+ DPA. 


Sn a.s. . Xn Sn 1 — 1 Sn—1 a.s. 
If — —> ¢ finite, then — = — — ——— — 0 and, hence, 
n n non— 


n 
by 16.3a,Cor. 2, >> P4, < ©. This proves the “only if” assertion and it 


, , Sn as. 
remains to prove that, if E| X| < ©, then ie EX, 


Let E| xX | <o and set S, = > %s, where X, represents X; trun- 
cated at k. Since 
> P| X.| = 2] = OPA, SE X| <o, 
it follows that the sequences S,/m and S,/n have same limit, and it 
suffices to prove that s — EX. Since, by the dominated convergence 


theorem, _ 
EX, = EXI,¢ — EX 


, ES, . 
and, hence, by the Toeplitz lemma, —— — EX, it suffices to prove 
n 


Sn a ES, a.s. 
that —————_- — 0. But 
1 
a7? Xn EX,? x? 
Lz SI] = ED Glas S2+ E| X| < ~, 
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since, setting B, =[m—1 S$|X|< m ], we have 4,°Bn = G or 
By according as n <mor » 2 mand, hence, 

1 
(m + 1)? 


2 


x 1 
Doe lata = mas 
n m 


n 


ty 


° dx 
s(1+ mf 5) Is Ss (2+|X|)Ia,s 


so that, by summing over m, we obtain the bound 2+ E| X|. Thus, 
theorem A applies, and the proof 1s complete. 

*17.4 Generalization. Let c, with or without affixes, be finite posi- 
tive numbers and let g, be continuous and nondecreasing functions on 
[0, +] such that g,(0) = 0 and gn(x) & cx* or 2 c’ according as0 < 
KC, OT x = Cx. 


a. If the series (i) > Pll Xn| = en] and (iit) © Egn(| Xn™ 
then >. (Xn — EX,") converges a.s. 


) converge, 


For convergence of (i) entails, by the equivalence lemma, convergence- 
equivalence of >> (X, — EX,") and >> (X,% — EX,*) and, by Ia, 


this last series converges a.s., since convergence of (ii) entails 


Vek = ex 


SE Eeall Xa) <e, 


b. If the series (i) XS Egn(| Xn|) or (ii) X f "Pi X,| 2 x] den(x) 
converges, then >. (Xn — EXn) converges a.s. " 
For convergence of (i) entails 
© Pll Xa| Bead SE Beall Xa) <a 
and 
d Egn(| Xn™|) S DX Ega(| Xn|) < &, 


so that a applies. 
Similarly, convergence of (ii) entajls, by integration by parts, 


o> Df “Pil Xa| Bx] dene) = E gulen)Pll Xa] B oa 
ae f gn(x) 4Pl| Xn | <x] 


=a eS P|| Xn | ea 7 a a Eg,,(| Xn” ), 
so that a applies. 
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A. If the series (1) >> Egn (\4 = ) or (11) =f" P | Xn | = bax] dgn(x) 


converges, then 


esas bacn 
2 “2 converges a.s. and — > (X, — EX;°"") =“, 0. 
fe nk=1 


Moreover, if (1) converges and g,(x) 2 cx forO <* Sy or for x = tn, 
then EX," can be replaced by 0 or by EXn, respectively. 


Proof. ‘The first assertion follows from b and the Kronecker lemma. 
As for the second assertion, if >.’ and >” denote summations over 
those values of # for which the first, respectively, the second, assump- 
tion about g, holds, then 


yi bnen 
EB X 


bntn 
-rf jill cs 


and 


< = ~ B X,, ay bnen 


eG 25. ~ EX, Og ~ EX, bren 


ox 
a | 5 Pl Xn| <x] 


1 ie x” 


<—) Een < ©, 
lita b 


nr 


This completes the proof. 
Particular cases. 1° Let gn(x) = |x|" with 0 < rq $2. Theorem 


A yields 
If}, | @:and So 
a, = O or EX, sania O<r,< 1 = iS 2. 


For rn = 2, we find 17.3ITA. 


El Xn E\ x, | < 0, then = = = 3: (Xe — a —; 0 where 
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2° Let gn(x) = x” for 0 S$ x S$ 1 and, to simplify the writing, set 
q(x) = Pl| X| = *], gn(x) = Pl] Xn| 2 x]. Theorem A yields 


: 1 
rf x{ >> gn(Onx)} dx <0, then >) 3. (Xn, — EX,"") converges a.s. 
0 n 
1” 0.8. 
and — >. (X, — EX,*) => 0. 
bn k—1 
We require the following 


Moments LEMMA. For everyr > O andx> 0 


1 1 
x” g(ntx) S E| xX| Ss 1+" D0 ¢(n'x). 


This follows from ; 
Axb=-frao=-Cf irae 
(n—1)'z 
0 
and 


1 1 
(n — 1)x"tg((m — 1)"*) — q(n"x)} 


1 


nz 


1 1 
=- 1" dg(t) S nx"{q((n — 1)"™) — g(n"x)}, 
(n—1)'2 
by summing the inequalities over 2 = 1, 2, --- and rearranging the 
terms. 


1 
3° If d, =n" and the laws of the r.v.’s X, are uniformly bounded 
by the law of a r.v. X, that is, gn Sq, then Z| X|" < © entails 


1 1 1 1 1g 
[Ears de sf xD oer) de sexi 


so that the right-hand side is finite for r < 2. Therefore, on account of 
2°, 
1 n a.s. 
Tf an Sq and E| X\’ < © with r < 2, then = Y (X_ — ax) —> 0 
- k=1 
mr 
where a, = 0 or EX; according asr <1,or 21. 


4° If F, = F, then the converse is also true. More precisely (Kol- 
mogorov: r = 1; Marcinkiewicz: r # 1), 
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Let the independent r.v.s Xn be identically distributed with common 
law &(X), and lettO0<r< 2. 


nr 


>, (XE — ax) —> 0 with a, = 0 or EX 
k=1 


If E| X|" < ©, then 


yilr 


according asr <lorr2 1. 


1 ” a.s. 
Conversely, if =, 21 (Xz — 4x) —> 0, then E| X |" < ». 
nq k=1 


Proof. The first assertion is a particular case of the preceding propo- 
sition. As for the converse proposition, we use the symmetrization 
method expounded in the following section. 

Let X’, be a sequence independent of the sequence X, and with 
same distribution, and let X’ be independent of X and with same dis- 
tribution; set X,° = X, — X’, and X* = X — X’. Then, on account 
of the assumption, 


n 


1 2 1 
Yn =a UX = Di (X% — 4%) — 
ne" pal 


i 
mil voy 


wilt 


o (X%, — an) — 0 
k=1 


and, hence, 


Xn? n— | L/r a.s. 
1 /r = Yr —_ Yn-1 —> 0. 
noir n 


Since the X,,° are independent r.v.’s, it follows that, for every x > 0, 
YX g(altx) = O Pll X.8 | = nl!) < @. 


Therefore, by the moments lemma, E| X*|" < so that, by 17.1A, 
Corollary 2, 


BX — eX | <28|X*|' < 
and, hence, by the c,-inequality, 
E| X|" S ¢,E| X — wX |" + o| wX |" < @. 


The proof is complete. 


*§ 18. CONVERGENCE AND STABILITY OF SUMS; CENTERING AT 
MEDIANS AND SYMMETRIZATION 


While centering at expectations goes back to Bernoulli and use of 
bounds in terms of variances goes back to Tchebichev, centering at 
medians and symmetrization are relatively recent. Yet, not only do 
they complete the first ones, but they also tend to replace them alto- 
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gether. Moreover, medians always exist and the ch.f.’s of symmetrized 
r.v.’s, being real-valued, are much easier to handle than complex-valued 
ones. 

*18.1 Centering at medians and symmetrization. Let F be the df. 
of ar.v. X. There exists at least one finite number X called a median 
of X, such that 

PIX 2 wX) 2 5S PIX S 2X] 
or, equivalently, 


F(uX) $3, FuX+0) 2 3. 


For, F being nondecreasing on R with F(—«) = 0, F(+0) = 1, the 
graph of y = F(x) completed at its discontinuity points by the seg- 
ments (x, F(x)) to (x, F(« + 0)) has either a point or a segment parallel 
to the x-axis, in common with the line y = 3. According to the fore- 
going definition, the abscissae of the common point or of the common 
segment are medians of X so that either X has a unique median or it 
has for medians all points of a closed interval on R—the median seg- 
ment of X. 

It follows from the definition of medians that, for every finite number 
c, we can set u(cX) = cuX. Furthermore, there is a relation between 


uX, EX, and o*X, namely, 
a. If X is integrable, then | wX — EX| S$ V20°X. 
For, by Tchebichev’s inequality, 
Pil X — EX| = V20?X] ¥ 3, 
so that 
EX — V20?X < X S$ EX + V20°X. 


Ar.v. X and its law as well as its d.f. F and ch.f. f are said to be sym- 
metric if, for every *, 


(1) P(X S x] = P(X 2 —<«]; 
equivalently, 

(2) F(—« +0) = 1— F(), 
or, for every pair a < 4 of continuity points of F, 
(3) Fla, b) = F[—6, —a), 
or 


(4) f =/ is real. 
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The symmetrization procedure consists in assigning to ar.v. X the sym- 
metrized r.v. X*° = X — X’, where X' is independent of X and has the 
same distribution. More generally, if X = {X:, ¢€ T} is a family of 
r.v.’s, then the symmetrized family is X*° = {X, — X",, tf € T} where 
the family X’ is independent of X and has same distribution. If X has 
affixes we afhix them to X® as well as to its d.f. and ch.f. Clearly 


b. To a rv. X with ch. f, there corresponds a symmetric rv. X*? = 
X — X' where X and X' are independent and identically distributed, and 
fi =f |? és the ch,f. of X°. 


We arrive now at inequalities which are the basic reason for centering 
at medians. 


A. WEAK SYMMETRIZATION INEQUALITIES. For every € and every a, 
(i) 1P[IX —pX = SP[X*= 
and 


Gi) 4P[|X-eX|z4qs Pix |= 4 s2P||xX-o| = <| 


Proof. Since X* = X — X’ where X and X’ are independent and 
identically distributed, it follows that to a median w = wX corresponds 
an equal median wp = wX’ and 


PIX 24 =P[(X-—y-(X-whe2dz2P[X-—yp26X'—-ypsd] 
= PiIX-—pe q-P[X’ ys 0) 2 oP[X —nv2 ¢. 


This proves inequality (1) which, together with the inequality obtained 
by changing in (1) X into — X, entails the left-hand side inequality in 
(ii). The right-hand side inequality in (ii) follows from the identical 
distribution of X and X’ only, by 


P\| X*| = eq = Pll (X-— 4) - (X’-a)|2 4 
<°(ix—elzg]+?[lx--12 
= 2P||x- ales], 


Corotiary l. If Xn — Gn =, 0, then X,,' -, O and an — pXn — 0, 
and conversely. 


This follows by letting 7 — © in (ii) where X is replaced by Xn. 
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Corotiary 2. Forr > 0 and every a, 
3E|X—pX|' SE| X*|' S$ 2,E| X—a|' 
where c, = 1 or 2°! according asr S lorr= 1. 


Proof. The right-hand side inequality follows, by the c,-inequality, 
from 


E| X*|" = E| (X — a) — (X’— 2) |’ S$ c,E| X— al’ +c, E| X’ - a|? 
= 2¢,E| X —a|’. 
As for the left-hand side inequality, it is trivial when E| X*|" = «@ and 


then, according to the inequality just proved (with 4 = »X), E | xX — 
uX |" = 0; thus, we can assume that E| X*|" is finite. Let 


g(t) = P\|X—pX|24 and ¢@ = Pil X*| 24 
so that, by A(i1), 
q(t) S 29°(2). 


It follows, upon integrating by parts, that 


E|X—aX|'=— J H dal) = J g(t) dt) <2 J (0 de) 
-~—2f rage = 28) xh, 
J t” dq*(t) | X° | 


and the proof is concluded. 
This corollary was used at the end of the preceding section. 


We pass now to symmetrized families and recall that, if two families 
{X1,¢€ T} and {X’,, ¢ © T} are independent, then events defined in 
terms of the X; and in terms of the X’;, respectively, are independent. 
We require the following 

c. LEMMA FOR EVENTS. Let events with subscript 0 be empty. If, for 
every integer J 2 1, 4;4;~1°:++Ao° and B; are independent, then 

P U A;B; = aP U Aj, a= inf PB;. 


More generally, if (4; + 4,')(4j-1 + Aj-1')%+ ++ (49 + Ao')® are in- 
dependent of B; and of B';, then 


PU (4B; + 4'5B';) 2 oP U (4; + 4’j), a = inf (PB;, PB’;). 
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Proof. The same method applies to both cases. For instance 

PU A;B; = PA,By + P(A,B1)°42By + P(A,B1)°(42B2)"AgBs +++ 
= PAB, + PAS A,B. + PA’ Ae A3B3 4+::- 

PA,-PB, + PAyA2g-PBz + PAy%A°A3:PB3 +--: 

a(PA, + PAY Ae + PAy%A2'A3 +++) = aP U 4. 


IV 


IV 


B. SYMMETRIZATION INEQUALITIES. For every € and every aj,j <n, 


(i) 2Pisup (X; — uX;) = ¢ S Plsup X;° = ¢] 
J Jj 


IIA 


(ii) —- }Plsup| X; — »X;| = ¢] S Plsup| X;*| = 4 
j j 


s 2P| sup | X;— aj = <|. 
j 2 


Proof. Since X;° = X; — X’; and the families {X;} and {X’;} are 
independent and identically distributed, it follows that to medians 
uj = Xj; correspond equal medians yp; = wX"’;; setting 

4; = [Xj- wp Ze), By = [X53 — 4, SO), Cj = [XP 2 €'), 
so that 4;B; C C;, the lemma for events applies, with a = $, and 


PU 4;5 PU 4;B; = PUG. 


This proves (i) by letting e’ Te, and (ii) follows by arguments similar 
to those used in the proof of A and by the lemma for events. 


a.s. a.s. 
Corotiary. If X;, — adn —> 0, then X,° —> 0 and an — pX, — 0; 
and conversely. 


By centering sums of independent r.v.’s at suitable medians, we ob- 
tain inequalities which can play the role of Kolmogorov’s inequalities. 


C. P. Lévy 1nequauities. Jf Xy, -++, Xn are independent r.v.s and 
k 


Se = >) X;, then, for every e, 


q=1 
(1) {max (Sk — w(Sk — Sn)) 2 el S 2P[S, 2 €] 
and 7 


(ii) Plmax | Si, — u(Se — Sn) | 2  S 2Pl| Sn | 2 el. 
Sn 
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Proof. Let So = 0, S*, = max (S; — u(S; — S,)) and set 
jk 


Ay = [S*z1 <6 Se — wl(Sy — Sn) 2 d, 
B, = [S, — 3, — US —-S,) 2 0] 
where n(S, — S,) = —u(S, — S,). Since 
[S*n = é] = D Abs [Sn = e] ~ » AnBrs PB; = 3 
k=1 k=1 
(1) follows upon applying the lemma for events or, directly, by 
P[Sn 2 =U PA,PB, =} PA, = $PlS*, = ). 
k=1 k=l 


By changing the signs of all r.v.’s which figure in (i) and combining with 
(i), inequality (11) follows, and the proof is complete. 

Remark. Let Xj, «++, Xn be independent, square-integrable, and 
centered at expectations. Since, by a, 


| u(Sz. — Sn) | SV 207(S, — Sy) S V207S, 


inequality (1) remains valid if u(S, — S,) is replaced by —V 2078, and, 
hence, changing ¢ into e — V 207S,, 
P{max S; = ¢] S 2P[S, = e — V 20°S,). 

*18.2 Convergence and stability. We are now in possession of the 
basic tools and shall apply them to the investigation of convergence and 
stability of sums S, = >) X; of independent r.v.’s. We recall that 

k=1 


here we say that a sequence of r.v.’s converges a.s. if it converges 
a.s. to ar.v., and their sequence of laws converges if it converges to the 
law of ar.v., that is, converges completely. 

]. Convercence. Whatever be the sequence of r.v.’s, we have the 
comparison table of convergences below: 


convergence a.s. => convergence in pr. = convergence of laws 


t 


convergence in q.m. 


(“in q.m.” means “in the 2nd mean” and reads “‘in quadratic mean’’). 
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For series of independent r.v.’s, reverse implications are also true, 
either with no restriction or under a uniform boundedness restriction. 
More precisely 


a. IMPROVED CONVERGENCE LEMMA. For series of independent r.v.’s: 


(i) Convergence a.s. and convergence in pr. are equivalent. 


(ii) If the summands are uniformly bounded and centered at expectations, 
then convergence a.s., convergence in pr., convergence in q.m., and con- 
vergence of laws, are equivalent. 


P , 
Proof. 1° Let S, — S, so that, by 6.3A, there exists a subsequence 


a.s. 1 
Sn, —? S with >) P [Sie — Sn, | 2 =| <o. Let a, < ” S mesi 
k 


and set 7, = max | Sn — Say — BSn — Snead , so that, by P. Lévy’s 


inequality (11), 
1 1 
22 Te 2 S22P | Sous — Sm] 2 5 <0 


and, hence, 7; —"; 0 ask — ». Therefore, 
| Sn —-S§— uCSn — Sneyr) | = | Sn ~ Sn, LS ~ Snes) | + | Sny ~ S| 
Ss T, +|Sn,—- S| = 0, 


that is, Sx — (Sn — Supa) —> & and, a fortiori, Sn — u(Sn — Sneys) 
P . Pp . 
— §. Since S, 2 S, it follows that u(Sn — Sn,,,) — 0 and, hence, 
Sn =“, 8. Thus, convergence in pr. of the series >> X, entails its con- 
vergence a.s. and, the converse being always true, the first assertion is 
proved. 

2° Let | Xn | <c<oand EX, =0. The series >> X, converges 
in q.m. if, and only if, as m,n — © 

nr 
m-+1 

or, equivalently, > o?X, < ©; then it converges in pr. and, hence, by 


the first assertion, it converges a.s. But if £(S,) > &(S), so that for 
all uw in some neighborhood of the origin 


—> log |fn| = — log| fs | < , 


262 SUMS OF INDEPENDENT RANDOM VARIABLES [Sec. 18] 


then, by 12.4B’, for u belonging to the intersection of this neighborhood 
with (—1/c, +1/c), 


2 0X, = Lex, S — 5 Elos | fale |? < 0, 


and the second assertion follows. 
The three-series criterion follows from this improved convergence 
lemma exactly as it followed from the convergence lemma in section 16. 


Remark. A better insight into the behavior of the series is provided 
by the Liapounov theorem for the bounded case, according to which, 
if sn? = Doo"? X, — © and ES, = 0, then, for any fixed @ >0 and 

k=1 
¢ > 0 and n large enough to have es, > a, we have 


(1) Pil Sal 2 4) = Pil s,| = en] > = J af dx. 


Thus, as e > 0, P[| S,| 2 a] — 1 for any fixed but arbitrarily large 
a, and the sequence £(S,) of laws diverges to a law degenerate at 
infinity. The second assertion follows ad contrario, and we see that when 
the sequence of laws does not converge, then, as 7 — ©, the distribu- 
tion of S, escapes to infinity in the fashion described by (1). 


So far we have been concerned with convergence of a given series. 
Yet various auxiliary centering constants appeared during the investiga- 
tion, and the problem arises whether, given the series >) Xn of inde- 
pendent r.v.’s, there exist centering constants @, such that the series 
>. (Xn — Gn) converges. If >> (Xn — 4n) converges a.s. for some nu- 
merical constants @,, we say that the series >) X,, is essentially conver- 
gent; otherwise, we say that it is essentially divergent, since, then, by 
the corollary of the zero-one law, >) (Xn — @n) diverges a.s. whatever 
be the a,. As above, our problem is to find criteria for this dichotomy 
and to find the suitable centering constants when the series is essentially 
convergent; at the same time, we shall be able to improve the preceding 
results (see also 37.1). 


b. EssENTIAL CONVERGENCE LEMMA. The series >) Xx 15 essentially 
convergent if, and only tf, the symmetrized series >) Xn° converges 4.5. 


Proof. If >> Xn° converges a.s., then, for every finite c > 0, using 
17.1A, by the three series criterion, 


> Pll Xn — uXn| Ze] S$ UO 2P[| X,°| Ze] < © 
and, upon integrating by parts, 
BY o%(Xn — uXn)? S Do(Xn')? + 2? X Pll Xn" | Ze] < @. 
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Therefore, the series >) {X, — uXn — E(Xn — uXn)°} converges a.s. 
and the “if” assertion is proved while the “only if” assertion is im- 
mediate. 

From this proof follows the 


A. Two-SERIES CRITERION. The series >) Xp is essentially convergent 
if, and only tf, for some arbitrarily fixed c > 0, the two series >~ jal Xn. 
uXn | >c] and Y\07(X, — uXn)° converge; then the centered series 
> {Xn — wXn — E(Xn — uXn)°} converges a.s. 


The essential convergence lemma permits us to improve further the 
convergence lemma. 
B. EQUIVALENCE THEOREM. For series of independent r.v.’s, conver- 


gence of laws, convergence in pr. and a.s. convergence are equivalent. 


Proof. It suffices to prove that convergence of laws implies a.s. con- 


vergence. Let f, be the ch.f. of X, so that | fn |? is ch.f. of X,°. If 
lf. — f ch.f., then J] | f, |? — | f|? and, by 13.4 B’, the two series 
k=1 k=1 

> P\| xX, | > c] and 5 o*(X,,°)° converge. Since E(X,°)* = 0, it fol- 


lows, by the three series criterion, that the symmetrized series >) X,,° 
converges a.s. Therefore, by the essential convergence lemma, there 
exist constants @, such that the series >) (X, — @n) converges a.s. to 
ar.v. and a fortiori its law converges completely, so that, for every x, 


ei F(u) —> f’(u), where f’ is a ch.f. By taking w close enough to 


mts 


O so that f(u)/’(u) ¥ 0, it follows that the series >) a, converges and, 
hence, the series >) X, converges a.s. This completes the proof. 


Corotuary l. 4 series >) Xn of independent r.v.’s converges a.s. if, 
and only if, Th — f and f is continuous at the origin or f 0 on a Set 
of positive Lebesgue measure. 

This follows by the continuity theorem or 12.4, 4°. 
Coro.iary 2. 4 series >) Xn of independent r.v.’s is essentially con- 


vergent or divergent according as 


nr 
lim [I |fz| XO on a set of positive Lebesgue measure or 
k=l 


lim[[|f.|=0 ae. 
k=l 


This follows by 13.4, 4° and b. 
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II. Srapitiry. Given sequences a, and J, fT ©, we seek conditions 
for a.s. stability of sequences S, of sums of independent r.v.’s. On ac- 
count of the corollary to the symmetrization lemma, a first condition is 


that a, = (= 3) + o0(1). Thus, it suffices to take a, = »(3) and 


nr 


; ; - ., 8 S 8. 
investigate conditions under which 3 — (3) —; 0. 
nr nr 


We have J, T © and, moreover, assume that there exists a subsequence 
b,, and finite numbers c, c’ such that, for all & sufficiently large, 1 <c’ S 


bn, ; . 
tt! < ¢ < oo. Roughly speaking, this assumption means that the se- 


Dn, 
quence 4, does not increase too fast, and it is always satisfied (with an 
. b Si, — Sp 
arbitrary ¢ > 1) when “t? > 1. Let S,, = 0 and T, = + vis 
n nk 


bn 


(ii) Ty — uT,— 0 as k — © or, equivalently, (11') for every « > 0, 
> Pll T. — uTe | = el < @. 


S'n n a.s. . 
A. A.s. STABILITY CRITERION. (1) 3H (=) —+ 0 if, and only if, 


Proof. Since the T; are nonoverlapping sums of independent r.v.’s, 
it follows, by 16.3A, that conditions (11) and (11’) are equivalent. And, on 
account of the symmetrization lemma, it suffices to prove equivalence 
of (i) and (ii) for symmetric summands; then the medians which figure 
in these conditions vanish. 


If Sn —; 0, then 


bn 
Sm 8; Q ask — 0, Ty = eS met = Sat _ Pes Seas #5, 9 


bn Dny Dny bny, bny 1 


and the “only if” assertion is proved. 


Conversely, if Ty = 0, then, by the Toeplitz lemma, 


Sn 1 & 
“nt — —_ 5,7; —> 0. 
Nk Onpial 
. Sn _— Sn e 
Furthermore, upon setting U;, = max [Ss = Sal and applying 
Nk n>=n Nk 


P. Lévy’s inequality we obtain, for every e > 0, 


~ P[U,= J $2E/P ll T:. | = el < om, 
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so that U; >, Therefore, for 2,1 <1 S mp, 


Sn Sn = Sinz Snz_1 Dny Days 
a ERI 
bn eh 


S 
b 


~ kK1 


nk, 
al 
S¢U, saa Sup — 0, 
Dny 


and the “‘if”’ assertion is proved. 


Sn — ESp as. 
Coro.iary 1. Tf | Xn | < bn, then oS —> 0 if, and only if, 


T, — ET; — > 0ask > © or, equivalently, for every « > 0, 
SSP | T, — ET; |= ] < ©, 
k 
Proof. The “only if” assertion is proved as that of the foregoing 


criterion. As for the “‘if” assertion, set Xnzk = (Xn — EXn)/Onyy Me—1 < 
n< mp, so that > Xn, = Ty — ET; ="; 0. Note that | Xx | < 2and 


apply 13.5, 3° and 18.1a. It follows that 
| uT, — ET,| S$ V 24? > 0, 


a: ‘ 5 <a Sn Sn 
so that 7; — wT; —> 0 and, by the foregoing criterion, i —w =) 


—; 0. But 
Ss Sn Snz ; ; 
Ge a = nee (Sn° = o°Sn), 

since, for 7,1 <” S mk; 

ls foe Ae 

A sw == 3 ft? — 0. 

c7 b, 1 ba? pa fy. oe 

ESn as. 
Therefore, oe —> 0, and the proof is concluded. 
; Xn 
Coroiiary 2. If the Xy are centered at expectations and >) 32 < 0, 
” 


then ae 0. 
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_ — co oF Sn ~~ Onpiy 
Let X, be X, truncated at d,, and set S, = 2 Xz Ty = oma She . 
=] nk 
Since, by Tchebichev’s inequality, 
2 
_ ao Xn 
L Pll Xn 4 Xn] = DX Pll Xn | S one] S DO < 0, 


bn? 


. , S 
it follows, by the equivalence lemma, that the sequences — and — are 


bn bn 
tail-equivalent. But ES,/b,— 0 since | EX, — EX;| < o?X:/d, while 


2 Xn mk g@X, 
2 
k 


Dn N>nk—-1 bn 


Nk 
&Pi|T.;2ds x = 
nN>Nk-1 

so that 
— 0X, 
&’> Pl Tx. | zed 52 


n 


< 0, 


. Sin as. Sn as. 
Corollary 1 applies, —- —> 0 and, therefore, —~ —> 0. 
On bn 
*§ 19. EXPONENTIAL BOUNDS AND NORMED SUMS 


In this section, the r.v.’s X,, 7 = 1, 2, ---, are independent and cen- 


tered at expectations with variance o,” = o°X, = EX,7; and S, = 
n 


>, X; are their consecutive sums, so that ES, = 0, s,2 = oS, = 


n 
>, o,”. We exclude the trivial case of degenerate summands. 
k=1 


19.1 Exponential bounds. Kolmogorov’s inequalities led, in Section 
17, to asymptotic properties of sums S,. His inequalities below, where 
to simplify the writing we drop the subscript 7, will lead to deeper re- 
sults but under more restrictive assumptions. 


S 


S é EC 
(1) If ec S$ 1, then P|} ->€| < exp - 5 l- > and, if €c 
$ 


S E 
then P| = > < exp| — “|. 
S 4c 


(11) Given y > 0, if ¢ = c(y) is sufficiently small and ¢ = &(y) is suf- 
RY 2 
ficiently large, then P - > | > exp E > (1+ »| 
5 


A. EXPONENTIAL BOUNDS. Letc = max 
ksn 


and let e > QO. 


IV 
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Proof. 1° Lett >0,|X| Sc <0, EX =0 and o? = o°X. Since 
72 Fe 
| EX" |<", Ee'*X =1 + EX + EX +e 


flO eT igec é, 


it follows that, for tc < 1, 


t2q? Le Pe t2¢? LC 
Be cit (145 $oote ‘yci+ (+4) 


< |= (1+5)] 
X a as ee 
as A) 2 


X S 
Replacing X by a , setting S’ = —, and taking into account that 
5 $ 


LX), 
Fes’ = 1 E exp =| 
k=l 

we obtain 


2 
(1) exp 5 (eS i) < fe exp [‘ ¢ + “)|: te <1. 
Inequalities (i) follow then from 


i tc 
P[S’ > ¢] S e~ Ee’ < exp | -# 4+ 5 (1 + “)] 


: 1 ; 
where ¢ is replaced by € or — according as ec S$ 1 or 21. 
C 


2° The proof of inequality (11) is much more involved. Let a and 
B be two positive numbers less than 1; they will be selected later in 
terms of the given number y. According to (1), we can take c suffi- 
ciently small (<a/f) so as to have 


(2) Fes’ > exp [f (1 — ~)| 
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On the other hand, setting g(x) = P[S’ > x] and integrating by parts, 
we have 


E'S’ = — f e* dg(x) =t f e'9(x) dx. 
We decompose the interval (—, -+°) of integration into the five inter- 
vals Jy = (—o, QO], ff = (0, (1 —6)], Jg = GU — 6), 4+ B)I; 


I, = (¢(1 + B), 8¢] and Js; = (84, +) and search for upper bounds of 
the integral over J; and Js and over Jz and J4. We have 


0 0 
h= if e'"9(x) dx < if e* dx = 1, 
On account of (i), we have on Js, for 8te < 1, 


x 1 
q(x) < exp E =| <exp[—2tx] for *2- 
4c C 


x? HC x? 1 
q(x) < exp| — = (1 a “| S exp E | <exp[—2x] for x <-- 

2 2 4 C 
Therefore, for ¢ sufficiently small (<1/8,) 


J; = if eg(x)dx < ‘ff e~*%@ dx <1 
Rt 


RE 
and 


(3) Jit Js <2. 


| 1 
On the intervals J, and I, we have « < — for c sufficiently small and, 
c 


by (4), 


x xe x 
eo(x) < exp| te — —(1——]] S exp|& ——(1 — 4#c) |] = &&™. 
2 2 2 
t 
The quadratic expression g(x) attains its maximum for * = 7? 
— 4c 


which, for c < B/4¢(1 + 8), lies in Jz. Therefore, for c sufficiently small 
and x € Jo, 


t? t? B 
g(x) S g¢(1 — B)) = 3 (1 — B)(1 + B+ 4te — 4tcB) < s(1 — _ 
and, then, 


t(1—B) t(1—B) i? I 
Je = f e'*q(x) dx < if e&@) dx < t? exp F (1 — 6°) | 
0 0] 
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similarly, 


8t 8¢ t2 1 
Js = i) e'*9(x) dx < rf e&) dx < 82? exp F (1 a Tale 
(148) (148) 2 2 


2 


€ 
ec POW ns ede 


so that, by (2), 


t? 1 
(4) Jot J4 < 9? exp EF ¢ _ 6) | 


g 9? eB” IE: € | 
———~ exp | — —————- | Eexp | ——_ 8" |: 

(1 — g)° 8(1 — B)” [are 

Since the last expectation and the inverse of its coefficient increase in- 
definitely as € —> ©, it follows, by (3) and (4), that for e sufficiently large 


Ii + Ts < 2 < + Fes, Js + Js < 1 Eel", 
Then 


t(1+8) 
Jz =t i) e'q(x) dx > 4Ee**’, 
—_ t(1—8) 

a fortiori, 


2728" +9) 9(¢) > oes [f (= 2) 
2 2 

and, since as « —> 2) 5 OP f «| — , replacing ¢ by its value, it 
follows that, for e sufficiently large, 

1 i? i 

g(e) > ae? Fa exp | = (1 + 2a + 28) 

e 1+ 2a+ 28 
2 (1-8) | 
But, given y > 0, we can select 8 > 0 so as to have 


2 


6 
1+ 2 — 
+ 28 +5 


> exp| - 


————— < | : 
fag ort 


Therefore, for ¢c = c(y) sufficiently small and e = e(y) sufficiently large, 


2 
q(e) > exp | oo +e | 


and (11) is proved. 
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*19.2 Stability. The a.s. stability criterion (which is due to Prok- 
horov for 6, = 7) is a criterion in the sense that it is both necessary 
and sufficient. Yet, it is not satisfactory, since, because of the independ- 
ence of the summands, it has to be expected that a satisfactory criterion 
ought to be expressed in terms of individual summands and not in terms 
of nonoverlapping sums. The nearest to this requirement is a criterion 
in terms of variances (due also to Prokhorov for 4, = 7), valid when 
the summands are suitably bounded, and whose proof is based upon the 
exponential bounds. 

Let by bo, 0 <8 <M Se < wand set Ti, _ See Sea 


nk 


1 
oT, = 5 > o°X,. We write logs for loglog. 
On Nk-1 <n <n 
| X,, | ; Sn as. , 
A. If j = o(logg* bn) then i. — 0 if, and only if, for every 


2 
; € 

e > 0, the series (i) >> exp | - s| converges. 

Lk 


n 


Proof. For 1 sufficiently large | <1, so that corollary 1 of the 


a.s. stability criterion applies: for every « > 0 


(ii) > Pil T.| > |] < . 


We have to prove that convergence of series (i) for some ¢ implies that 
of series (11) for the same or distinct e; and conversely. On the other 


| Xn | 


hand, elementary computations show that, setting c, = max ——, 
nei<nsnz On 
. . . aK . 
the assumption made implies that c, = loe & with a4, - Oask — ~, 
0g 


We use now the upper exponential bounds and observe that for 
Ck 73 = 1 and & sufficiently large 


€ l\4a, 2 
P| T.| > d <2exp| - loge] = 2(-) < s+ 
4a}, k 


| ga] sa” [- 6] - on [- fees] <j 
e — ——_ | S exnD ] — —— |] = ex — —— — 
P 4,251 P 4ec en 4ah, 88 k? 
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so that the corresponding sums in (i) and (ii) converge and we can 
€ 


= I. 
tie 


neglect all those terms for which c, 


Since for cz = <i 
k 


2 2 
PI Tyl>d<2ew|-S(1-25)| <20p|-¢5 | 


it follows that convergence of series (i) for every e > 0 entails that of 
series (ii) for every e > 0. Conversely, if series (11) converges, then 


, é. 
T;, — O and ¢,2 — 0, so that, for & sufficiently large, 7,18 as large as we 


Lr . . 
please and —* < —is as small as we please. Therefore, the exponential 
k € 


bound 1s valid with, say, y = 1, and 
2 
P{| T,| >] > 2exp| - S|. 
k 


so that convergence of series (ii) for every « > O entails that of series 
(i) for every ¢ > 0, and the proof is concluded. 
E| Xn |?" 


Sn a.s. 
—sn < 0, then — — QO. 


Corotiary. If, for anr 2 1, 2) — 
n nN 


For r = 1, this proposition coincides with Corollary 2 of the a.s. stabil- 
ity criterion, so that it suffices to consider the case r > 1 (due to Brunk). 


rH rti 
Proof. Let Xn = Xn or O according as | X,| <7" or 227", so 
that 


El otogetm, DEEL gpI C, 
y 82 , yu yt 


and, by Tchebichev’s inequality, 


= ot E| X,, |?" 
D PiXn ¥ Xn) = XY Pll Xn | = 2") S D—— << . 


yet 
Therefore, on account of the equivalence lemma, it suffices to prove 


that the assertion holds for r.v.’s X, which satisfy the assumption made 
in A, 
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But, upon applying for r > 1 the inequality E”| X| < E| X'|’, setting 
n, = 2", and applying the c,-inequality with 2, — m,_; summands, we 
have, summing over 7 = mp_1 + 1, +--+, My 


J 
te" —_ ao < — EX Das 
1}: 
Ei Xn 
= 4 sax, Pr s ae 
Therefore, 
E| oe ie 


zh SH rt 
r) € 2 e e ° 
and, since we have exp | — © < t,“" for k sufficiently large, criterion 
k 


A is satisfied, and the proof is concluded. 

*19.3 Law of the iterated logarithm. We say that a numerical se- 
quence J, belongs to the upper class or to the Jower class of a sequence 
S, of r.v.’s, according as P[S, > d, i.0.] = Oorl. 4 priori, there may 
be sequences 4, which belong to neither of these two classes. However, 
if S, is an essentially divergent sequence of consecutive sums of inde- 
pendent r.v.’s, then every sequence 4, belongs to one of the foregoing 
two classes. The problem which arises is that of corresponding criteria. 
Relatively little is known about its general solution (in the case of un- 
bounded summands), and the proofs of what is known are quite i1n- 
volved; the best results are due to Feller. The basic known result was 
first obtained by Khintchine (also P. Lévy) in the Bernoulli case as a 
strengthening of consecutive improvements of Borel’s strong law of 
large numbers and, then, was extended by Kolmogoroff (also Cantell1) 
to more general cases, as follows: 


A. Law OF THE ITERATED LOGARITHM. If 


| Xn | 


nN 


Sn? > 0 and 


= o(loge—” 5); i = (2 logs Ld ae 
then 
e S'n 
P im sup = | = |. 


SInkn 
In other words, for every 6 > 0, the sequence (1 + 5)5ntn belongs to 
the upper class of the sequence S;, while the sequence (1 — 6)5ntn be- 
longs to the lower class; clearly, it suffices to prove these assertions for 
§ arbitrarily small. 
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We observe that, since the assumptions remain valid if every X,, is 
replaced by —X,, the conclusion yields 


. 2 2 Sn 
P tim inf — = -1] 
Sntn 
and, therefore, it holds for both sequences S, and | Sn | if it holds for 
the first one. 
2 
Sn+1 _ 
Sn 
lows that, for every c > 1, there exists a sequence 7, = 2,(c) To as 
k — , such that s,, ~~ c¢*. Let 6, 6’, 6” be positive numbers. 
1° We prove that the sequences (1 + 5)5nfn belong to the upper 
class of the sequence S,, by proving the same for the sequence S*, = 


max S§,. For 
nSnE 


2 1 + o(loge! 5,7) — 1, it fol- 


Proof. Since s,° — © and 


P[Sn > (1 + d)Sntni.o.] S P[SE, > (1 + 4)5n,_, fry, 1-0-] 
where 


1+ 5 
(1 + 6) Srp ~ . Snenk > 


T8514 a/and 


P[S%, > (1 + 8)5np_y fnp_zis0-] S P[S*, > (1 + 8) 5nyfn, i-0.]. 


hence, taking 6’ < 6, we can select c > 1 so that 


Thus, the assertion will follow from the Cantelli lemma if we prove that 
yD PST, > C+ 8) snztn] < @. 


But, by the remark at the end of 18.1, the general term of this series 


. V2 
is bounded by 2P]S,, >(1+ 6 — 


nk 


) saane | where 1 + 6’ — 


— 1+ 6’ Therefore, for 5” < 5’ and & sufficiently large, 


V2 
) sata Ss P[Sn, > (1 + 5) Snptnzls 


nk 


P[su> (i406 


and it suffices to prove that the right-hand side is general term of a 
convergent series. This follows by applying the first upper exponential 
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bound with e, = (1 + 6”)é,, and c, = max | X; ae valid Yor & suf- 
ficiently large since ¢xtn, —> 0, so that 


P[Sn, > (1 + 6") Satu] S exp [—3(1 — ence/2)(1 + 0") ?tny?] 
1 
(2k log c)t te" 


and the assertion is proved. Furthermore, according to the considera- 
tions which follow the statement of the theorem, this assertion entails 
that P[| Sn | > (1 + 8) Satp i.o.] = 0. 

2° It remains to prove that the sequences (1 — 4’)s,¢, belong to the 
lower class of the sequence S, where we will take 1 > 6’ > 6. This as- 
sertion will be a_fortiori true if we prove that it holds for a sequence S),. 


Let 


< exp[—(1 + 8”) loge 5a,7] ~ 


1 
eS Sao OS Sa (1 — =)’ 
vp = (2 loge uz”) ~ (2 loge 5n,”) = tn, 
and set 
Ay = [Sap — Sry, > CL — d)uxox). 


We prove first that P[4;,i.0.] = 1, as follows: The sums S,, — Sn,_,5 
being nonoverlapping sums of independent r.v.’s, are independent and, 
by the Borel criterion, it suffices to prove that >) P4, =. But, 


e, = (1 — 5)y%,— © whilec, = max (| Xn |/ux) ~0ask— ~; hence 
nk—1l<nSnk 
l 
the lower exponential bound for PA, applies with 1+ y = er 
Therefore, 


PA, > exp[— (1 + y)(1 — 8)?0,7] = exp [—(1 — 8) loge u,7] 
1 
(2k log c)'~* 
the series }> PA; diverges, and P[4; 1.0.] = 1. 
On the other hand, if B, = || Sica S 25n, ,fn,_,], then, according, 
to the end of 1°, P[B,° 1.0.] = 0; thus, from some value 7 = n(w) on 


ISnz_(@)| S 25n,_fn,_, except for w belonging to the null event [B;’ 1.0.]. 
Therefore, P[4;,B; 1.0.] = 1, and this entails the assertion. For, 
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ArBy, Cc [Sn, > (1 — 5) upvx ~ 25, _engals 


Y% 2) 
(1 — 5)unv~ — sn, ny ~ }CQ -— 5 \(lL-sa) - 7 [ Sawin 
C C 


and, if we take ¢ sufficiently large so that for 6’ > 6 


a 
ao (1-4)" 251-8 
C C 
then 
1 = P[A,B;,1.0.] SP [Sp, > (1 — 6’) Saytn, 1.0-]. 


The proof is terminated. 


COMPLEMENTS AND DETAILS 


ri) 
As throughout this chapter, Sn = >) X% and a.s. convergence is to a r.v. 
k= 


1 

7. If the ch.f. of a sum of two r.v.’s is the product of the ch.f.’s of the sum- 
mands, the summands may not be independent. Construct examples. Here 
is one: X is a Cauchy r.v.—with ch.f. e~™!; consider X + Y where Y = cX, 
c> 0. 

2, Let X, Y be independent r.v.’s and let r 2 1. 

If X and Y are centered at expectations, then Z| X + Y|* majorizes E| X |" 
and E|Y |". More generally, if, say, 4 is an event defined on XX, then 
E| xX+ Y |'La = E| X "Ta. 

If E| X + Y |ris finite, so are E| X |"and E| Y |. (Since| x |" = | Ew + Y) |" 
Ss E|x+ Y|’, it follows that 


EX+Y¥ bla =f dx) {flety I dFv()} & f |x|" ax) = E] X|"La. 


For r> 1 the first assertion implies the second one. For r= 1, set 4 = 
[| X | < a] and observe that E| X+ Y|2= E(| Y| — ala = (E| Y| — @)PA,) 

3. Generalized Kolmogorov inequality. Let Xi, Xe, --: be independent r.v.’s 
centered at expectations, and let r2 1. Set C=[ sup | 5. | 2 c] and prove 


that 

c'PC S E|S,|Ic S E| Sn]. 
Apply to the same problems to which Kolmogorov’s inequality was applied. 
For example, if S, a S, then S, —"+ 5. (Set C, = [sup | S;] <c, | Se] 2 al, 


n n 
So =0. By 2, E|S,|"Ico = XE Sn |e, 2 2 =| Sy, |"Ie, 2 eTPC.) 
4. Let Xi, Xo, --+ be independent r.v.’s, and let 7," = sup |S. |", 72 1. 


If the X;, are symmetric, then ET,” S 2E| S, |’. 
If the X;, are centered at expectations, then ET,” S 2?"*'E| S, |’. 
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Extend to 2 = 0 when S, =, Sx. (If symmetric, then 
ET, ={ P[T,’ = sats 2f Pll Sn |? = dat = 2E| S,|*. 
0 0 


If centered at expectations symmetrize; then 


| St |r < 27-1 sup | Sy — Sy |" + 27-1 Sy |r. 


Integrate over X’1, --+, X’n, take sup, integrate over Xi, ---, X,, and apply 
the first assertion.) 
5. Let Xi, X2, --- be independent r.v.’s centered at expectations, and let 


r=. If DUE) X,|?"/nrt+! < 0, then “= ="; 0, (Apply # and the elementary 


n nm n 
inequality (>) a,”)" S n™-! > | a, |?" to obtain El] S, |?* S en™— p> E| Xz |. 
k=1 K=1 = 

By Tchebichev’s inequality, oe 


P|| Sokti — Sok | > de] < cQrtig—2r > E| X; [27 /jr +3, 
j= oko y 


Apply the a.s. stability criterion with 2, = 2*.) 

6. The series >* cne where the 0, are independent r.v.’s with Ee = 0, 
converges or diverges a.s., according as the series >» c,? converges or diverges. 

7. If a series >» X,, of independent r.v.’s converges a.s., then by centering 
the summands at the terms of some convergent series, the a.s. convergence and 
the limit are preserved under all changes of the order of the summands. (Start 
with a series which converges in q.m. Use the centering in the two series 
criterion.) 

8. A series >) X, of independent r.v.’s with ch.f.’s f, converges a.s. whatever 
be the order of summands if, and only if, > | fn —1| < ©. 

9. If a series 5° X, of independent r.v.’s is essentially divergent, then it 
degenerates at infinity: P[| 5, |< c] — 0 however large be c > 0. State the 
dual form for essential convergence. (This is true for the symmetrized series. 
Prove and apply: if X and X’ are independent and identically distributed, then 
P| X| <c] S Pl| X — X’| < 2c.) 

10. Let >) X;, be a series of independent r.v.’s with ch.f.’s fn. 

If for a subsequence of integers m —> 0 there exist r.v.’s Yn with ch.f. gm 
such that S,, and Yi, — Sm are independent and | gm |? — | g |? continuous at 
the origin, then >) X, is essentially convergent. (This follows from 


™m™ 
iT | fe | 2 lem| — |g] > € > 0 in a neighborhood of the origin.) 


Il. Smoothing by addition. Loosely speaking, a sum of independent r.v.’s 
is at least as “smooth” as any of its summands. More precisely, continuity or 
analyticity properties of the law of one of the summands continue to hold for 
the law of the sum. Examples: 

(a) If one of the summands has a continuous law so does the sum. (Intro- 
duce the ‘“‘concentration” Cx defined by Cx(/) = max Pe SXSx+ 4, 

z 


12 0. Observe that Cx(0) = 0 if, and only if, Fx is continuous. By the com- 
position theorem for independent r.v.’s X and Y, Cx4y S Cx, Cxsy S Cy.) 
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(b) If one of the summands has an absolutely continuous law, so does the 
sum. (In defining the concentration replace translates of segments of length / 
by translates of Lebesgue sets of measure /.) 

(c) If one of the summands has a strictly increasing d.f., so does the sum. 
What about unicity of medians? 

12. The symmetrization method reduces medians to zero and transforms 
essentially convergent series into a.s. convergent ones. However, only cen- 


tering at medians does not yield a.s. convergence. In fact, let pa X, be an 
a.s. convergent series of independent summands. The sequence mS) of me- 
dians may not converge. However, if = Xn is essentially convergent and the 


r.v. Y is independent of all the X, watt: nes a strictly increasing d.f., then, after 
centering the 5, + Y at medians, the series converges a.s. 

(For the counterexample, take Xo = —1 or +1 with same pr. 1/2; let 
0 <p, <1 with >> pp < © and, for n 2 1, take Xon_1 and Xe, with values 
2(—1)* of pr. pn and O of pr. 1 — pp. The sequence S, converges a.s., yet the 
S, are odd integers with u(S4n—1) 2 1 and w(S4n41) S 1. For the last assertion 
use 11(c).) 


2 
13. The X, are not assumed to be independent. If “ 5 U and the X, 


are uniformly bounded, then s =*5 U. What if n? is replaced by n* where 


k is a fixed integer? What if n? is replaced by [g”] with g > 1 arbitrarily close 
to 1? 

More generally, let >> P[|U,-—U|>/n*< for every €>0, 
>, P| Xn | > cn8] <0 for some ¢c>0, O<aS1,8B>0. If y2a+8, 


then U,, 2:8; U, where U, = S,/n7 

(For the first assertion, the second part of the proof of Borel’s strong law of 
large numbers (see Introductory Part) applies. For the second assertion, use 
the following property of series: if >) | pal|/n*<o with O<a Sl, ‘then 
2 | Poy | < © for M41 — m, = 0(m2)). 


In what follows, the r.v.’s X1, Xo, +++, are independent and identically dis- 
tributed with common d.f. F, and ch.f. f of a r.v. X; the trivial case of X = Oa.s. 
is excluded. In other words, repeated trials are performed on X. 


14. Random selection. Let vy < ve <+-+ be integer-valued r.v.’s such that 
every [v; = 7] is defined on Xi, ---, Xn-1. The r.v.’s X1,, Xn, °**, are inde- 
pendent and identically distributed—as X. (Proceed as in 


PIX, < #1, Xin < x2] Plpy = m, Xn, < %1p V2 = m2, Xng < Xe] 


1S <n2<0 


Ply, = 7), Xn < X13 Pe = no|P[Xn, < x] 
1S) <n2<0 


P[X1 < miJP[Xe < %2].) 
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15. Deviations from the median. If X is centered at a median, then 


(2n + 1)! 


x, | = oS EX, _ ») = I)! 
E| DX: | 2 n oF Mil, gQn +1) = gQn +2) (2"n!)? 


This inequality 1s not necessarily true when X is centered at its expectation. 
Extend to nonidentically distributed X;’s. (Divide R” into its 2" “octants”’ 
and consider the corresponding parts of the left-hand side. For a counter- 
example, take 7 = 3, X = 1 with pr. 2/3 and —2 with pr. 1/3.) 

16. Equidistribution of sums. If X is a lattice r v. with step —only possible 


values kh, k = 0, 41, ---—set M(g) = lim oad mal ~ iy I BURA) and otherwise 


set M(g) = jim — 1 “o(%) dx for those functions g on R for which either of the 
foregoing limite exists and j is finite. 


(a) In the first case M(e'”) = 1 or 0, according as u = 0 (mod =) or u 0 


(mod =) . In the second case M(e™*) = 1 or O according as u = C0 or u ~ 0. 
(b) For every u € R, 


n a.s. 
dé 1uSk —> M( ett) 


sim 


(This is immediate in the lattice case and if u = 0. Otherwise /(u) ~ 1 and 


HYP =-+5 RD PH )s- 
where c is finite. Use /3.) 
(c) The family G of functions g on R such that * >> g(S,) —; M(g) con- 
k=1 


tains all almost periodic functions and functions with period » Riemann- 
integrable on [0, p]. (G contains all functions g(x) = e”. It is closed under 
additions, multiplications by complex numbers, conjugations, and uniform pas- 
sages to the limit. J is a linear monotone operation on G.) 


If gn € Gand gn > g, then M(g,) — M(g). If g'n, 2"'n € G and Meg’,) — 
M(g"",) — 0, then for every g such that g’, S g S gn whatever be m, g € G 
and M(g) = lim M(g’n) = lim M(g’’,). 

(d) For X degenerate at an irrational a, the classical equidistribution (modulo 
1) of the fractional parts of za follows: for g bounded with g(x) — ¢ finite as 
x —> +0, 


12 2.8. 
— >) g(Sk) — ¢. 
nN k=l 


For every finite segment J, (no. of S51, «++, S, in D/n —; 0, 

17. Normal r.v.’s. Let X be normal with EX = 0, EX? = 1, let g on R" be 
a finite Borel function, and set X = S,/n. 

(a) If g(x1 te, et, Xn + 6) = ge, +++, xn) for all x, ¢ C R, then the ch.f. 
of the pair X, g(X, -+-, Xn) is f(u, v) = filu)fo(u, v) where f,(u) = e-¥7/? is 


[Src. 19] SUMS OF INDEPENDENT RANDOM VARIABLES 279 
ch.f. of ¥ and fala, 0) = (2n)7*? f Ri Bay Stas ea 


1 FUN) oe i 
log A(*1, eg Xn) a. S ee iro jug (x1, =" Xn). 
2 k 71 


(b) If fe is analytic in u, then X and g(X, ---, X,) are independent. In par- 
—_ n — 
ticular, X is independent of ate | X; — X,| and of 2a |X, — X |", r> 0. 
’ k = 


(fo is independent of u: set « = inc and use the translation property of g.) 
(c)- Let » with or without affixes denote a pr. density with respect to the 


1 
Lebesgue measure. Let p(x) = exp [—(x — a)?/2b7] be the pr. densit 
ebesg p(x) JE p [—( / p y 


of X,, and set 
S? =" > (%— XY, § = S/V/n, ya Vie a, z= V5» 
k=1 


poe le 


Then the pr. density of Y is , the pr. density of Z converges to 


T 


1 _ a2 . . (tag? 
—— e *’”, the pr. density of (Y, Z) converges to e tv)? and 


2r 21 


Chapter VI 


CENTRAL LIMIT PROBLEM 


The Central Limit Problem of probability theory is the problem of 
convergence of laws of sequences of sums of r.v.’s. 

For more than two centuries a particular case—the Classical Limit 
Problem—has been the limit problem of probability theory. The pre- 
cise formulation of this case and its solution were obtained in the second 
quarter of this century. At the very time that this particular problem 
was receiving its definite answer, the much more general Central Limit 
Problem appeared, and was solved almost at once, thanks to the power- 
ful ch.f.’s tool and to the truncation and symmetrization methods. 


§ 20. DEGENERATE, NORMAL, AND POISSON TYPES 


20.1 First limit theorems and limit laws. Three limit theorems and 
corresponding limit laws are at the origin of the classical limit problem. 
Let S,, be the number of occurrences of an event of pr. p in ” independ- 
ent and identical trials; to avoid trivialities we assume that pg ¥ 0, 
where g = 1 — p. If X;, denotes the indicator of the event in the &th 

n 


trial, then S, = >| X;, 2 = 1, 2, --+, where the summands are inde- 
k=1 


pendent and identically distributed indicators—this is the Bernoulli 
case. Since EX, = p, EX;,? = p and, hence, o?X, = p — p* = pg, it 
follows that 
ES, = >) EX; = ND, oSn => o°X;, = npq. 
k=l k=1 
The first limit theorem of pr. theory, published in 1713, says that 
Sn P 


— p. Bernoulli found it by a direct but cumbersome analysis of 
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the asymptotic behavior of the “binomial pr.’s” P[S, = k] = C,*p*g"—, 
k = 0,1, 2, ---, 7. 

Sharpening this analysis, de Moivre obtained the second limit theo- 
rem which, in its integral form due to Laplace, says that 


P[ <e] > Sef ow|- jr] -o sxe 
———— < «| - — xp} —- , —~SxS~, 
704 mJ. = Lb 2° 1” 


The third limit theorem was obtained by Poisson, who modified the 
Bernoulli case by assuming that the pr. p = p, depends upon the total 
number x of trials in such a manner that xp, — > 0. Thus, writing 
now Xnx and Sp» instead of X; and S,, the Poisson case corresponds 


to sequences of sums Snn = >) Xnkgy 2 = 1, 2, ---, where, for every 
k=1 

fixed 7, the summands Xp, are independent and identically distributed 

— . d 1 . 

indicators with P[LXn;, = 1] = 7 +o *) . Byadirect analysis of the 


asymptotic behavior of the binomial pr.’s, much easier to carry than 
the preceding ones, Poisson proved that 


* 
P[Snn = Rk] ne k =0, 1,2, -->. 


Thus are born the three basic laws of pr. theory. 

1° The degenerate law &(0) of a r.v. degenerate at 0 with d.f. having 
one point of increase only at « = 0 and ch.f. reduced to 1. 

2° The normal law SU(0, 1) of a normal r.v. with d.f. defined by 


1 * 1 
Fe) = Fe f on |- 53" | 
and ch.f. given by 


“) 1 —_ Z | ; “ 
= ——|-— exp | — — | dz = exp] — — |- 
“XP 2 VJ It J niu P 2 _ 2 


The well-known value of the last integral is obtained by using Cauchy 
contour integration theorem. 
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3° The Poisson law P(r) of a Poisson r.v. with d.f. defined by 


[7] )% 
F(x) =e*yS — 
k=o F! 
and ch.f. given by 
oO Nu re) (re) * 
— pas tuk —. 2A 
f(u) =e ee rr € a 


— eree™ —1) . 


While the first two limit laws played a central role in the development 
of pr. theory, Poisson’s law long stood isolated and ignored. We shall 
see later that there was a deep reason for this isolation and also that, 
unexpectedly enough, Poisson’s law is, in a sense to be made precise, 
more fundamental for the central limit problem than the two others. 
With the notation introduced above, the three first limit theorems 
can be summarized as follows: 


A. FIRST LIMIT THEOREMS. In the Bernoulli case £& (2 — 2h) — 
n 


£(0) and & (— 2) — (0, 1), while in the Poisson case &(Snn) 
Ton 
P(A). 


The proof by means of ch.f.’s reduces to elementary computations. 
We have, taking limited expansions of exponentials, 


Sn ” X, _— 
Eexp|iu-—"?] = T1 Kexp | ix k 4 
n k=1 n 
1u 1 n 
= (po0 [54] + 2e0| -*)) 
n 1/4 
Uu nr 
(vee 
\7N 


— np ” Xk — Dp 
Eexp| iu |- TI Zexp E 
V "Pq =1 
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E exp [tuSpn] = [J E exp [tuXnz] = (Dn exp [tu] + gn)” 
k=l 


=(1+*(eptid - 9 +0(-)) 
n n 
— exp [A(e™ — 1)]. 
The three first limit laws give rise to the three first limit types: 
the degenerate type of degenerate laws £(a) with f(u) = e™®; 
the zormal type of normal laws 9t(a, 4”) with f(u) = exp inc — | : 
the Poisson type of Poisson laws @(A; 4, 4) with 
f(u) = exp [iva + A(e™ — 1)]. 


The three first limit theorems extend at once by means of the con- 
vergence of types theorem; we leave the corresponding statements to 
the reader. 

*20.2 Composition and decomposition. The three first limit types 
possess an important closure property. Its deep parts are the normal 
and the Poisson “decompositions” discovered between 1935 and 1937. 
P. Lévy surmised and Cramer proved the first one and, then, Raikov 
proved the second one. 

Let £(X), £(X1), £(Xe2) be laws of r.v.’s with corresponding ch.f.’s 
Ii fi fo. We say that £(X) is composed of &(X1) and £(Xe2) or that 
£(.X1) and £(Xe2) are components of £&(X) if, X, and X2 being inde- 
pendent, &£(X) = £(X; + Xe) or, equivalently, if f = /i/o. 


A. CoMPOSITION AND DECOMPOSITION THEOREM. The degenerate and 
the normal types are closed under compositions and under decompositions. 
The same is true of every family of Potsson laws (A; 4, 6) with the same b. 


To avoid exceptions we consider degenerate laws as degenerate normal 
and as degenerate Poisson ones. 
Proof. 1° Closure under compositions 


£(a1) * £(ag) = L(a1 + 42) 
3U(41, 1) % (ae, by”) = Way + ay, by? + dg”) 


O(A13 41, 4) % P(Ag; 42, 0) = P(A + Ag; 21 + 42, 4) 
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follows at once by means of ch.f.’s, for 


qua, uae 
é . 


1uUad;i — ——UuUu . Zuao —- -——Uu 
exp 1 exp 2 9 
by? + 5,” J 


= exp ina + a) — > u 


e — giv(aitag) 


exp [iua, + dy(e™® — 1)]-exp [fuae + Ao(e? — 1)] 
= exp [iu(a, + a2) + (Ar + Aa) (ec? — 1)]. 


The decomposition property of the degenerate type is immediate. 
For, if for every u © R,fi(u)fo(u) = e*, then | ft | | he | = | and, since 
lAl <1, |f| S 1, it follows that |f, | = |fo| = 1, so that by 14.1a 


Alu) =e", fou) = ew ER. 


The proof in the normal and Poisson cases is much more involved. 
To begin with, we can, by a linear change of variable, make a4 = O and 
6 = 1 in the laws to be decomposed. ‘Thus, we have to seek ch.f.’s 
fi and fe such that, for every u € R, 


falwfalu) =e? 
filu)fo(u) = "0, 


2° We consider first the normal decomposition and apply 15.3A. 


4 
Since e ? is an entire nonvanishing function in the complex plane, 


the same is true of /;(z) and fo(z), and there exists a constant ¢c > 0 
such that | fi(z) | S el’, Therefore, upon taking the principal branch 
of log fi (z) (vanishing at uw = 0), it follows from the Hadamard: factoriza- 
tion theorem that log /1(z) is a polynomial in z of, at most, second de- 
gree. Since f,(u) being a ch.f., reduces to 1 at u = 0, equals f;(—z), 
and is bounded on R, it follows that 


or 


. by? 
log fi(u) = tua, — > u*, uCR, 


where a and J are real numbers. Similarly for fo(u), and the normal 
decomposition is proved. 
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3° Jt remains for us to consider the Poisson decomposition. Let 
X, and X2 be two independent r.v.’s with d.f.’s Fy and Fe, and let F 


be the d.f. of their sum. Since 
[a, S Xy < Ay][ao S Xo < bo] C [a1 + ag S Xy + Xe < Oy + By] 


and Xy, Xz are independent, we have 


(1) Fi [@1, 51) Fe[ae, b2) S Flay + ae, b; + be) 
and, letting 2), bg — ©, it follows that 
(2) F(a, + ae) S Fy(a,) + Fe(ae). 


Let now a, and ag be points of increase of F; and FP, respectively. If 
ay €& (41, 4) and ag € (ae, v2) whence ay + ag © (41 + 42, 1 + 42), 
then the left-hand side in (1) is positive and, hence, a; + a is point of 
increase of F. Moreover, if a, and ag are first points of increase, then, 
taking 4, <a, and dg < ag in (2), we have F(a, + ae) = 0, and, 
hence, a; + ae is the first point of increase of F. 

Now let / be the Poisson d.f. corresponding to @(A); its only points 
of increase are k = 0, 1, 2, ---. Therefore, on account of what pre- 
cedes, all points of increase a, and ag of its components Fy; and Fp 
are such that a; + ag = some & and the first points of increase are 
a and —a where a is some finite number. It follows, replacing F;(«) 
by Fi(« — a) and Fo(x) by Fo(« + a) (this does not change F’), that 
the new d.f.’s have k = 0, 1, 2, --- as the only possible points of in- 
crease. Thus, we can set for the corresponding ch.f.’s 


Ai(u) = dane, fo(u) = D dye 
k=0 k=0 
with . . 
ao, 4o > 0, Ary Or = 0 for k > 0, >) a = > Oy = |. 
k=0 k=0 
Upon setting z = e, 91(z) = fi(u), ¢e(z) = fo(u), we have to find 
nonvanishing functions ¢g; and ¢ge such that 


re) 0 \k,—~A 


r 
yi(Z)ea(2) = Dy axbyz* th = : 2. 
k,l=0 ro *! 


Therefore, 
agby + A1b,-1 +++ + axbo = 


Nees 
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and it follows that 
/ Al z]—-1) 
< , < — 2 . 
ar | $1 (2) | = é 


Thus, ¢;(z) and similarly go(z) are nonvanishing entire functions at 
most of first order. It follows from the Hadamard factorization theorem 
that they are of the form e°?**. Since /,(u) reduces to 1 atu = Oand 
is bounded by 1, we have 


logfi(w) = aCe — 1), m4 20. 


Similarly for fo(u), and the Poisson decomposition is proved. This 
terminates the proof of the theorem. 


$21. EVOLUTION OF THE PROBLEM 


21.1 The problem and preliminary solutions. From the time of 
Laplace and until 1935, the limit problem aims at weakenings of the 
assumptions under which the /aw of large numbers (convergence to £(0)) 
and the zormal convergence (convergence to 9U(0, 1)) hold. This clas- 
sical problem can be stated as follows: 


n 


Let Sy = >) Xx be consecutive sums of independent r.v.’s. Find condi- 
k=1 
tions under which 


Sn ~— ES, Sn — ES, 
2(2E®) — 2, 2(22E*) = n,n 
n oS'n 


It is implicitly assumed, in the first case, that the summands are 
integrable, and in the second case that their squares also are integrable. 
To simplify the writing, we shall center the summands at expectations, 
so that, in this section, EX, = 0, ES, = 0. We also set f,(u) = Eei*«, 
o, = oX, and 5S, = oS,, and exclude the trivial case of all summands 
degenerate. 

Although not the first historically, the solution of the extension of 
the Bernoulli case to independent and identically distributed sum- 
mands (not necessarily indicators) is immediate—when ch.f.’s are used. 


A. If the summands are independent, identically distributed, and cen- 


Sn Sn 
tered at expectations, then & (**) — £(0) and & (**) — (0, 1). 
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For, if f is the common ch.f. of the summands, then, by using its limited 
expansions, we have 


realm’) ((2)) (+0) 


and, since 5,7 = 10" > 0, 
Sn u n o- ; on 5 n 
E exp iu =f 5, = tn” +o 52" 
un ur? n un 
“(£4 erl-2] 
( nt n *P 2 


However, the first reasonably general conditions are the following. 
nr 
B. Let S, = >> X;, and Sn = cSn, where the summands are independent 
k=1 
r.v.'s centered at expectations. 


] n 
(i) fsa DE| Xe|' — O for a positive 6 S 1, then 
n k=1 


£ (*) — £(0). 


1 nm 
(i) fq 2X E| X, |? — 0 for a positive 6, then 
Sn k=1 


£ (*) — 9r(0, 1). 


Sn 


[r+ and 


The assumptions imply finiteness of moments E| Xz 
E| X; |?*, respectively. 
The first assertion is slightly more general than the classical ones. 


For 6 = 1, it becomes the celebrated Tchebichev’s theorem. It also con- 


Sn 
tains Markov’s theorem: if E| X,|'*? Sc < », then £& () — £(0) 


. ws ¢ . 
(since, then, the asserted condition becomes 3 0); since, for 6 > 1, 


EX;2 S (E| X, |1+?/+° Markov’s theorem is valid with any 6 > 0. 

The second assertion is the celebrated Liapounov’s theorem which has 
been the turning point for the entire Central Limit theorem. More- 
over, while the ch.f.’s were known to and used by Laplace, the first 
continuity theorem for ch.f.’s: 


if fa(u) > e 2, then &(X,) > 200, 1), 
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is to be found, proved but not stated, in Liapounov’s proof of his theo- 
rem. We observe that (ii) has content only when at least one of the 
r.v.’s 1s not degenerate at zero and, then, the hypothesis implies that 
Sn —> ©, 

Proof. 1° To begin with, let us reduce in (ii) the case 6 > 1 to 
5 = 1, so that it will suffice to assume that O < 6 S 1. 


1 n 
Let Y be a r.v. whose d.f. is — >> Fy and, hence, 
M1 k=l 


1 n 
E|Y|"=- Dz X,|’. 
2 k=1 
According to 9.3b. log Z|Y|" is a convex from below function of r > 0. 
Therefore, for 2 + 6 > 3, we have 
5-log E| Y|* S (6 — 1) log E| Y |? + log E| Y [2+ 
or, equivalently, 
ls 3 o46\ 
BEAwP s (sa Eau) 


It follows that, if the condition in (ii) holds for a 6 > 1, then it holds 
for 6 = 1. Thus, in what follows we can limit ourselves to 0 < 6 < 1. 
2° We use limited expansions of ch.f.’s, the continuity theorem, and 
the expansion log (1 + z) = z+ 0(|z ) valid for |z| <1. As usual, 
@ with or without affixes denotes quantities bounded by 1. 
Condition (i) implies that 
E| Xt jr+é 1 n 


max ——Ty5 Sati X, (1t? — 0 
ken git 7 2 E| x | ) 


so that, for uw arbitrary but fixed, 
“ 9-3 E| X, |! 
—-j}=1 Ong| u [t+ > ———__ 1 
ft (<) + 1+ 6 Hm pits 


uniformly ink Sn. Therefore, for ” sufficiently large, 


nr 1 n 
», log fr (*) = 26,| u|**?-—— SE X, [1+ 0, 
k=1 n nits k=J 


and the first assertion is proved. 
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Condition (11) implies that 


IA 


2+6 E| X 2+6 1 n 
max () < eg > E| X, |? — 0, 
k=l 


ksn Sn ksn gr cone 


so that, for wu arbitrary but fixed, 


u 2 oy qi—s E| Le [eee 
—)=1—-—-— + —_——_ ">, a ?? ———__— 1 
fe (<) 0 Ga (i+a(24+ 8) Hm se 
uniformly in k Sn”. Therefore, for 2 sufficiently large, 
n Uu un 
>» logf,{ —) = —-— (1+ a(1)) 
k=1 Sn 2 1 n 2 
+ 26’, u |??? E| X,P? = ——. 
| | Sete 2 | z | 7 


and Liapounov’s theorem is proved 


Bounvep case. If the summands are uniformly bounded, then 
&(S,/n2) — £0). If, moreover, 5, —> ©, then &(Sn/sn) — 90, 1). 


For, if | X,| S ¢ < ~, then E| X, |'t? < c1** and E| X, |? Ss co,?, 
and, hence, 


n cits 
i YE Xi; 1+6 sO 
git kal | | n° ? 
r = ra 
O45 Deu Prs — — 0 as Sn > ®, 
Sn k=1 Sn 


Tools for solution. The preceding theorem is not satisfactory since 
moments of higher order than those which figure in the formulation of 
the problem are used. Yet a restatement of this theorem with 6 = 1, 
together with the truncation method, will provide the stepping stone 


towards the solution. 
nr 


a. Bastc LEMMA. Jf San = >) Xnk, where the summands are inde- 
k=1 


pendent r.v.’s (centered at expectations), then 

e ° 1 7 2 Dinh 

(i) if 3 SE| Xn |? > 0, then & oy a £(0) 
k=l 


_ we ee, 
(ii) if ; 7 LE Xnz |? — 0, then o( 


nn k=l 


Snn 


) — 9x0, 1). 


mn 
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It suffices to replace in the proof of 21.2B subscripts k and 2 by double 
subscripts zk and mm, respectively. 

In order to use the truncation method we shall require a weak form 
of the equivalence lemma. We say that two sequences £(X,) and 
£(X'n) of laws are equivalent if, for every subsequence £(X,) — L£(X), 
we have £(X'n) — £(X), and conversely. 


b. Law-EQUIVALENCE LEMMA. If Xn — X'n *, 0 or P[Xn 4 X°n] 
—> 0, then the sequences &(Xn) and &(X',) of laws are equivalent. 


For the second condition implies the first one which, by 10.1d, implies 
the asserted equivalence. 


21.2 Solution of the Classical Limit Problem. We are now in a 
position to give a complete solution of the problem. 
X1, X29, *** are independent r.v.’s centered at expectations, with 


d.f.s Fy, Fo, «++, ch-f.’s fi, fos ***, and variances o17, oo”, ++; 
Tr nr 
Sn = Pp» X;, are their consecutive sums with variances 5,7 = >> o;”. 


To simplify the writing, we make the convention that all summations 
are overk = 1, ---, 7. 


() 

A. CLASSICAL DEGENERATE CONVERGENCE CRITERION. £{(—] —> 
1) 
£(0) tf, and only tf, 


|z|2n 


(ii) ->f «dF, — 0, 
{2|<n 


n 


wolf eae-(L,8)} 8 


Proof. 1° Let (i), (ii), and (iit) hold. We wish to prove that 
£ (=) — £(0). In what follows we apply the law equivalence lemma 


and the first part of the basic lemma. 
Let San = >> Xng, where Xp, = Xz or 0 according as | X,| <x or 
| X;,| 2 2. On account of (i) 


Snn Sn 
P| 2] 55 Pita ~ m= Tf dF’, — 0, 
n nN |x|2n 
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mn 


; S 
so that it suffices to prove that £ (=) — £(0). But, on account of 


(11), 


1 1 
— ESnan = - > x dF, — 0, 
nN nN Ja[<n 
; . Sinn ~ ESnn : 
so that it suffices to prove that £|——————-] — £(0). But this 


n 
follows, by Tchebichev inequality, from (111) and 


l 
=) » E| Xnk _ EXnk |? 
n 


1 2 
-55{f 2 dre (f «dFs) | +0 
n lal<n lInl<n 


Sn 


Sn 
2° Conversely, let £ (*) — £(0); equivalently, — “ 0 org,(u) = 
n n 


Il fex(uz/2) — 1 uniformly on every finite interval. Let » be suffi- 
k=1 


ciently large so that log | £n(u) | is bounded on [—c, +c]. By the weak 
symmetrization lemma and the second truncation inequality 


= | 


i Xi — 
eal 


k=1 


X is 
43 


Since 


so that u.X,/n — 0, it follows that the foregoing relation with c > 1 yields 
(i) and, hence, £(Sn,/”z) — £(0). But, by the first truncation in- 
equality, 


nr 


(1) 2>) o°(Xnz/2) = >> a” (Xnx*/7) = -3 log | gn(1) |? — 0, 
k=1 


eee 7 . . . Snn _ ESnn P 
so that (11) holds, and, by Tchebichev inequality, —————— —> 0 


Therefore, 
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and (ii) holds. The proof is completed. 
Observe that centering at, and in fact existence of, expectations were 
not required. Also, according to the proof, 


Sn — ES nn . eae 
L& (==) — £(0) = (1) and (a1) hold. 
n 


Sn 
B. CLASSICAL NORMAL CONVERGENCE CRITERION. £& (= — 9t(0, 1) 
Sn 


and max — —> 0 if, and only if, for every « > 0, 


ksn Sn 


1 
&nle-) = ZL x dF, — 0. 
Sn | x |Z sn 
The “if” part is due to Lindeberg and the “only if” part is due to Feller. 
Proof. 1° Let gn(e) - 0 for every « >0. We apply the law 
equivalence lemma and the basic lemma. 
Since g,(e) — 0 for every € > 0, there is a sufficiently slowly de- 


; 1 ., 
creasing sequence é,|0 such that —3 ga(en) - 0 and, @ fortiori, 
en 


l . 
— gn(en) — 0, gn(en) — O (it suffices to select a sequence 7, fT © as 
En 


1 1 1 
k — o such that gp (2) < 73 for m = n, and, then, take e, = ; for 
Nk <sn< N41). We have 
oR 1 
k 
max —; S max—; x? dF, + én? S gn(en) + en? — O, 
ksn Sn ksSn Sn | z [2 en8n 


and the “‘if” assertion will be proved if we show that £& (5) — 91(0, 1). 


nr 
Let Xnz = X, or O according as | X; | < €n5n OF | X;, | = €,5,. Since 


Snn Sn 1 
P| "| 5 5 Pike x X= Lf | dF S — &n(en) > 0, 
Sn Sn x |= ensn En 
Snn 
it suffices to prove that £ (=) — 9(0, 1). 
Sn 


Since the X; are centered at expectations, we have 


1 
f x dF; f x dF; f x? dF. 
| x | <ensn | |Z ensn EnSn YI @ 12 ensn 


= 


| EXne| = 
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Therefore, 
I nl€n 
= 3 EX, | < eh 0 
Sn En 
and, setting Syn? = o7Snn, we eran 
Sanz 1 
arr rr e |. eat (D [EXne |)? S gnlen) + 
, 2 
Sn (én) 30 
En” 


Snn ~~ ESnn . 
Thus, it suffices to prove that £ oe) — 9(0,1). But, this 
Jn 


follows from . 


Tg Xnz — EXnx |? S — rt E(Xax — EXnx)® S 2en = —> 0, 


nrmn nn Snn 


and the “if” assertion is proved. 
2° It remains to prove the “only if” assertion. 


, Ok ; 
Since max — — Q, it follows from 


ksn Sn 
u Uu 2 647 
Se\—)=1-&—-5 
Sn 2 Sn 


max |fe(“) — 1] — 0, El n(* =) = 177 0. 


ksn 


that 


u 
Therefore, for 7 sufficiently large, log fy (“) exists, so that 
5 


can[n't]-fin(2) ~29[-2 


becomes 


and, since logz =z —-1+6[z—1 ?, 


2~=[1-4(Z)}|-o 
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Upon taking the real parts, we obtain 


U7 ux 
—-5f (1 — cos) a 
2 | x | <esn Sn 


= ot (i — cos“) dF, + o(1). 


Since 
ux u? 
=f (1 — cos “) dF Ss 5 x? dF, 
| z | <esn Sn 25 n | z | <esn 
2 2 
52 (S27 — ee as aFy) = > A — gale) 
and 
Ux 
=f (1 — cos") dy 5 25 aF;, 
| x |2e8n Sn | x |Z es, 
2 > 2 oR, < 2 
= x“ =S-) 
- e757 | z |= e8p an e? 


it follows that 
2 


u 2 
5 Sale) s 3 + o(1). 
Therefore, letting 7 — © and then u — © in 


05 gn(e) S =(5 + o(t)), 


we obtain gn(e) — 0. This concludes the proof. 

*21.3 Normal approximation. In his celebrated investigation of nor- 
mal convergence, Liapounov examined not only conditions for, but 
also the speed of, this convergence. His results were greatly improved 
by Berry (and, independently, by Esseen) and to present the basic one 
we shall proceed in steps. 

Let F and G be d.f.’s of r.v.’s with corresponding ch.f.’s f and 
g, and let H= F—G,h=f—g. We exclude the trivial case of 
a = sup| H| = 0, that is, H = 4 = 0. 


a. If G is continuous on R, then there exists a finite number s such that 


either H(s) = Fa or H(s +0) =a. 
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Proof. Let x, be a sequence such that | H(%n)| — a. It contains a 
subsequence *,/ — S finite or infinite. Since H(*) — 0 as x — Fo 
and a > 0, 5 must be finite. 

The sequence *, contains a subsequence x, such that either 
(xn) — —aor H(xn,) — +a. It suffices to consider one case only, 
say the first, for the same argument is valid for the other. Thus, let 
Xn —> S$, H(%n) — —a; we know that H is continuous from the left. 

If the sequence *¥, contains a subsequence converging to 5 from the 
left, then —a = lim H(x,) = H(s), and the assertion is proved. 
Otherwise, this sequence contains a subsequence converging to s from 
the right, —a = AH(s + 0) and, G being continuous on R, 


—a SS A(s) S$ F(s +0) — Gis) = F(s + 0) — G(s + 0) = —a, 


so that —a = H(s). The assertion is proved. 
Let p be the derivative of a symmetric d.f. (of a r.v.) differentiable 
on R, so that p(x) = p(—«), «CR. 


b. If G has a derivative G' on R, then there exists a finite number “a’’ 
such that 


[HG + apts) dx) 25 -6f ple) ds), 6 = sup|G']. 
28 
Proof. If 8 = ©, then 33 = 0, and the inequality is trivially true 
whatever be a. Thus, it suffices to prove it when B<o. Let 
— oe > @) 
Y= 28 ° 


We have, for an arbitrary a, 


(1) f H(x + a)p(x) dex 
> | H(x + a)p(«) dx | — f A(x + a)p() dx 
|a|<y |zl2y7 
and 
(2) : J 46 + a)p(x) dx| S a eo dx. 


On the other hand, according to a, there exists a finite number s such 
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that, say, —a = H(s). For | x | <y, we have, setting a = 5 — y so 
that 
S—ly <xe+t+a<cs, x-—y <0, 


the relation 
G(x + 2) = Gis) +0 — y)G(*’), |[o| S1, 5 —2y <x <s. 
Thus, for | «| < 7, 
He + a) = F(x + 2) — G(s) — 6 — YG? 
<= F(s) — G(s) — B&« — 7») 
=—a— p(x —y) = —B%+7), 


and it follows that 


(3) He + a)p(x) dx < —8 J ef + MPG) a 


|a|<y 


~ By J p(x) de 


z\|<yv 


_ 5 (1 — p(x) dx). 


[z|Zzy 


Upon substituting in (1) the bounds given by (2) and (3), we obtain 


[xe + a)p(«) dx | = 5 (1 — sf pt) ax) 


and the assertion follows. In the case a = H(s + 0), the argument is 
similar. 


Let » be a real ch.f. with f | @(u) | du < 0, so that the correspond- 


ing d.f. has a symmetric derivative continuous on R, given by 


1 , 1 
p(x) = — fe a(u) du = — [ cos ux-w(u) du. 
an 21 


c. For every aR 


JS 
an 


h(u)o(u) 
u 


du > | f Hx + a)p(x) dx 
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h(u)a(u) 


Proof. We can assume ———— to be integrable, for otherwise the 
u 


inequality is trivially true. According to the composition theorem, 
he is the Fourier-Stieltjes transform of H defined by 


F(x) = | He — »)(0) dy. 


alu)a 
Since Bu)atu) is integrable, the inversion formula yields 
u 
A(x) — A(x’) = — — fs — h(u)a(u) du. 


But, as x’ — —o, H(x’) — 0 and, by the Riemann-Lebesgue theorem, 


h 
fe —iuz! Awa) — 0. Therefore, 


—1iu 


1 _ A(u)o 
IEG — y)p(y) dy = fe 


—iu 


and, hence, replacing x by a, y by —«, and taking into account that p 
is symmetric, we obtain 


A(wol4) 


—iu 


EC + a)p(«) dx = ~ fe ~iua 


The asserted inequality follows. 
We are now in a position to establish the basic inequality below, of 
independent interest. We shall require a real integrable function ap 


defined by ao(u) = 1 — Le] or 0 according as | «| < Uor | u | =U 
Its Fourier-Stieltjes transform po 1s given by 
1 — cos Ux 


0) =f (1) con wea 
= — — ——)cos ux du = ———__—— 
Po\w lr J_y U °° rx2U 


and we have fo 2 JU, f pots) dx = 1, so that @ is a ch-f. 
A. Basic 1nEquaLity. Jf G has a derivative G’ on R, then, for every 


U > 0, wy 
sop] H| s=f ae) 
7 JQ 


du += sup | G’|. 
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Proof. Upon replacing w and p by a and po, the propositions b and 
c yield the inequality 


1 7 {kh 1 
~ f Ol aw = — f 
Qn ¥—U ar 


u 
where 


- 1 ic U. 2 £°% d. 9) 4 
[ pole) de = — [ Soar s= f OP eat 
7 vis 


h(u)wo(z) 


Uu 


du z 5 = 6f po(x) ax) 
7 


TJ, x*U yu x? @wyU wav 
Therefore, 
1 77 | h(n) a 126 
1 || a, 2 
TIVO u 2 xU 


and the asserted inequality follows. 


In order to apply the basic inequality to the normal approximation 
problem, we have to bound the corresponding 4. Let F*, and G* be 
u? 


the.d.f.’s of £& (=) and 9U(0, 1) and let *, = f*, —e 2 denote the 


difference of the corresponding ch.f.’s. The summands X, are inde- 
pendent r.v.’s centered at expectations, and we set ya° = E| X, |°, 
3.7 


fn =z > yz°. We exclude the case of one of the y, infinite, for 


then the normal approximation theorem below is trivially true. 


2 u? 
d. If|u| <5, then | h*,(u)| 3 2gn°| u |? exp a 
En 
Proof. 1° First, we prove the assertion under the supplementary 


- 1 
condition |u| =—. Then g,?| u|* = 1 and it suffices to prove that 
Z &. p 


n 
2 


| A*, (i) | < 2 exp E “| . But, since 
2 ue 
| h*,(u) | s | f*n(w) | + exp E “| Ss | f*n() | + exp E “|, 


9) 2 
it will suffice to prove that | f*,(z) |? S exp | - =| 


Consider the symmetrized r.v. X, — X’, where X; and X"; are inde- 
pendent and identically distributed, so that its ch.f. is | fi |? and 


E(X; _ Xe = 20%”, E| Xp — Xk | <s By 13 < 0, 
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Therefore, 


2: 92 
| fu) |? S 1 — 04?u? + = vel u|® S exp | -onPu’ + a | u P| 


; Uu : ; 
and, replacing u by — and summing over k = Il, --+, 2, we obtain, 
Sn 


using the fact that, by assumption, S| iD. 


3] ,, [3 2 
exp | w+ SLE] exp | +] 


3.2 
2 
=| 304] 


1 
2° It remains to prove the assertion when | x | < — and, hence, 
n 


IIA 
IA 


| f*n (a) |? 


1 
= ae u 22g 25 


Sn Sn 2 2 


Then, we have 


2 3 
us ee en 
n(*)=1- Zhu ail =l1-n, 


Sn 25n 


where | r,| <4, so that 


u 
log fx (“) = -—-7rh + orn. 
5 
On the other hand, 


so that 


and, summing over &, we obtain 


ur -° 
log f*n(u) = — > + 0 u|3, 
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Since, for every number a, e* = 1+ 6’g¢°*, it follows, taking a = 
gn 3 J a 

-—| u |? < — so that e* < 2, that 
24 2* 


£n° ur 
£27 [uP eo] 3 | 


f*n(u“) — exp E “] 


2 
< 2¢n3| un |8 exp | - “|. 
and the proof is complete. 


B. NoRMAL APPROXIMATION THEOREM. There exists a numerical con- 
stant ¢ < © such that, for all x and all n, if F*, is af. of &(Sn/sn) and 
G* is df. of X (0, 1), then 


| F*.(%) — G*@)| S— SE X,[*. 
Jn k=l 


For, upon replacing 4*, by its bound obtained above in the basic 


2 
inequality with U=—,, F = F*,, and G = G* hence sup | G’ | = 
§n 


Tag? We obtain 
s ise (fe on [-F] a+ Fe) 
S - gp u“ exp | — — | du + —=]}- 
ee 0 en 3 21 


§ 22. CENTRAL LIMIT PROBLEM; THE CASE OF BOUNDED 
VARIANCES 


22.1 Evolution of the problem. The classical limit problem deals 
with independent summands X,, with finite first moments and, in the 
normal convergence case, with finite second moments as well. Those 
moments are used for changing origins and scales of values of the con- 


n 
secutive sums S, = >), X,% so as to avoid shifts of the pr. spreads 
k=l 


towards infinite values. There is no reason for these choices of ‘‘norm- 
ing’ quantities except an historical one; they are a straightforward 
extension to more general cases of the norming quantities which ap- 
peared in the Bernoulli case. 4 priori, there is no reason to expect 
that these quantities will continue to play the same role in the general 
case. Furthermore, whether they are available (that is, exist and are 
finite) or not, other choices might achieve the same purpose. Thus, 
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the problem becomes a search for conditions under which the law of 
large numbers and the normal convergence hold for normed sums 


= — 4,. The methods remain those of the classical problem, but the 
nr 

computations become more involved. However, remnants of the two 
first limit theorems in the Bernoulli case are still visible. For there is 
no other reason to expect or to look for limit laws which are either de- 
generate or normal. 

The real liberation which gave birth to the Central Limit Problem 
came with a new approach due to P. Lévy. He stated and solved the 
following problem: Find the family of all possible limit laws of normed 
sums of independent and identically distributed r.v.’s. We saw that 
when these r.v.’s have a finite second moment, the limit law (with 
classical norming quantities) 1s normal. Thus, P. Lévy was concerned 
primarily with the novel case of infinite second moments and finite or 
infinite first moments. 

Naturally, the question of all possible limit laws of normed sums 
with independent, but not necessarily identically distributed, r.v.’s 
arises at once. Yet, the Poisson limit theorem is still out, for it is rela- 
tive to sequences of sums and not to sequences of normed consecutive 
sums. Moreover, as we shall find it later (end 24.4), under “‘natural’’ 
restrictions Poisson laws cannot be limit laws of sequences of normed 


° * ° . . n 
sums—which explains their isolation. But sequences 3 an are a 


n 


particular form of sequences >) Xnx (set Xnk = *2) , and this 
k=1 


provides the final modification of the problem. 

The general outline of the Central Limit Problem is now visible: 
Find the limit laws of sequences of sums of independent summands and 
find conditions for convergence to a specified one. Yet, so general a 
problem is without content. In fact, let Y, be arbitrary r.v.’s, set 
Xn1 = Yn, and X,, = Oa.s. fork > 1 and every x. Then the sequence 
of laws becomes the sequence £(Y,), so that the family of possible limit 
laws contains any law &£—take L(Y,) = &. Thus, some restriction is 
needed. 

To find a “natural” one, let us consider the problems which led to 
this one. Their common feature is that the number of summands in- 
creases indefinitely and that the limit law remains the same if an arbi- 
trary but finite number of summands 1s dropped. To emphasize this 
feature, we are led to the following “natural” restriction: the summands 
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P 
Xnk are uniformly asymptotically negligible (uan), that is, Xnz —> O 
uniformly in & or, equivalently, for every « > 0, 


max fal Xnr | =e] — 0. 
k 
Finally, the precise formulation of the problem is as follows: 


kn 
CENTRAL LIMIT PROBLEM. Let Syy, = D> Xnk be sums of uan inde- 
k=1 


pendent summands Xnky with kn —> ©. 
1° Find the family of all possible limit laws of these sums. 
2° Find conditions for convergence to any specified law of this family. 


To simplify the writing, we make the following conventions valid for 
the whole chapter. 

(i) k= 1, +--+, kny kn — ©, the summations >, the products J], 

i i 


the maxima max, are over these values of k, and the limits are taken as 
k 


usually for 7 — ©, unless otherwise stated. 
(ii) Finx and fnz denote the d.f. and the ch.f. of r.v.’s Xnz, Pn and fn de- 
note the d.f. and the ch.f. of >> Xnz. Thus, the uan condition becomes: 
k 
max i | adF'n~x —> 0 for every « > 0, and the assumption of independ- 
k z|Ze 
ence becomes fn = [I fnx. The problem becomes 
k 


Given sequences fn = || fnx of products of ch.f.’s of uanr.v.’s: 1° Find all 
k 


chs f such that f, — f; 2° Find conditions under which fy, — f given. 
If these ch.f.’s have log’s on J = [—U, +U], we always select their 
principal branches—continuous and vanishing at #« = O, and then on J: 


logfn = >, log far fn — f (uniformly) = logf, — log / (uniformly). 
i 


The solution of the problem is due to the introduction, by de Finetti, 
of the “infinitely decomposable” family of laws and to the discovery 
of their explicit representation by Kolmogorov in the case of finite 
second moments and by P. Lévy in the general case. 

It has been obtained, with the help of the preceding family of laws, by 
the efforts of Kolmogorov, P. Lévy, Feller, Bawly, Khintchine, Marcin- 
kiewicz, Gnedenko, and Doblin (1931-1938). The final form is essen- 
tially due to Gnedenko. 

22.2 The case of bounded variances. As a preliminary to the in- 
vestigation of the general problem, and independently of it, we examine 
here the particular “‘case of bounded variances” —a “natural’’ extension 
of the classical normal convergence problem. It is much less involved 
computationally than the general one, while the method of attack is 
essentially the same. 
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We consider sums >> Xp, of independent r.v.’s, centered at expecta- 
k 


tions, with d.f.’s Fz, ch.f.’s fry and finite variances on,” = o*Xnx such 
that 


(C): max Cnke —> O and > ong? Sc < 0, where c is a constant 
k 
independent of n. 
Since, for every e > 0, 
1 
max P| Xnw | Sel Ss 2 max onke — 0, 

k 

the uan condition is satisfied and the model is a particular case of that 


of the Central Limit Problem. The boundedness of the sequence of 
variances of the sums entails finiteness of the variance of the limit law. 


a. COMPARISON LEMMA. Under (C), log fnz(u) exists and 1s finite for 
n= ny, sufficiently large and, for any fixed u, 


> {log fne(u) — (fax(“) — 1} — 0. 
k 
2 
Proof. Since fnz(w) = 1 — Onx a u*, it follows from (C) that 


u*. 


1” c 
max | fax(#) —1| S$ —~maxon”? > 0, DO | fre(z) — 1| S - 
k 2 &k k 2 


Therefore, for 2 = ny sufficiently large, | fnz(e) — 1 | < 4, so that the 
log fng(u) exist and are finite, 
log fnk(#) = far(#) — 1 + One| faa(#) — 1]?, 
and it follows that 
|X (log fuel“) — Fux“) — 1} | 
, SX fau(w) —1 }? 


max | fur (2) — ] | 2 | fue () — 1 | — 0. 


IA 


The comparison lemma is proved. 


Let 
da(v) = Cals) -D=Z f (ei® — 1) dF ay 
k k 
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Since 
fraFu = 0, Sf +? ans <6, 
we have ° 
v,(u) = oy — 1 — iux) 5 x? dF ap 
or 


Valu) = i] (ei — 1 — iux) 5 ak, 


where K, on R is a continuous from the left nondecreasing function 


with K,(—«) = 0, Var K, S ¢ < ©, defined by 


K(x) => { y? dF on, 
k ¥ —o 


and the integrand, defined by continuity at x = 0, takes there the 
value —u?/2. The comparison lemma becomes 


a’. Under (C), log [I fuz — Yn — 0. 
ke 


Functions of the foregoing type will be denoted in this subsection by 
y and K, with or without affixes. Thus, unless otherwise stated, y is a 
function defined on R by 


, 1 
y(u) = fe — 1 — iux) 73 KW); 


and K is a d.f.—up to a multiplicative constant—with K(—«) = 0, 
Var K Sc;wand K will have same affixes if any. 


b. Every eY is a chf. with null first moment and finite variance o* = 


Var K, and is a limit law under (C). 


Proof. The integrand is bounded in « and continuous in w# (or *) for 
every fixed x (or u). It follows that y is continuous on R and 1s limit 
of Riemann-Stieltjes sums of the form >> {iuany + Ang(e"* — 1)} 

P 


where 


Ank = 9 K[¥nks Xn k+1)s Onk = —dnkXnky Onk = Xnk3 
wnk 


we can and do take all subdivision points xn, ~# 0. Since every sum- 
mand is log of a (Poisson type) ch.f., the sums are log of ch.f.’s, and so 
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is their limit yY according to the continuity theorem. The second asser- 
tion follows, since, by elementary computations, 


(e%)'ua0 = (Y)'uao = 0, (0%) "uno = (Y/unco = — Var K. 
Finally, let X,,, & = 1, ---, # be independent r.v.’s with common log 
of ch.f. being ¥/n. Since w/n corresponds to K/n, we have o?Xnz 
= Var K/n while EX,, = 0. Since > Xn has for ch.f. e” whatever 
be 2 and condition (C) is fulfilled, the last assertion is proved. 

c. UNIQUENESS LEMMA. wy determines K, and conversely. 
Proof. Since 
_y'"(u) = f eiut JK(x), VarK <0, K(—«) =0, 


the inversion formula applies and K is determined by y by means of 
y’’. The converse is obvious. 

d. CONVERGENCE LEMMA. Let (C) hold. If Ky - K, then Jn — y. 
Conversely, if tn — log f, then Ky — K and log f= wy determined 
by K. 


Proof. The first assertion follows at once from the extended Helly- 
Bray lemma. As for the converse, since the variations are uniformly 
bounded, the weak compactness theorem applies and there exists a 
K (with Var K Sc) such that Ky > Kasn' 3 © along some subse- 
quence of integers. Therefore, by the same lemma, yy, — y~ = log/ 
since W, — logf. But, by the uniqueness lemma, y = logf deter- 


mines K, and it follows that K, —, K. The proposition is proved. 


Upon applying the foregoing lemmas, the answer to our problem 
follows: 


A. BouNDED VARIANCES LIMIT THEOREM. Jf independent summands 
Xnk are centered at expectations and maxon,” —- 0, Don”? Se <@ 
k k 


for all n, then 
1° the family of limit laws of sequences &()) Xnk) coincides with the 
k 


family of laws of r.v.’s centered at expectations with finite variances and 
chf.’s of the form f = e”, where is of the form 


, 1 
y(u) = fe — 1 — tux) 33 AK)» 
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with K continuous from the left and nondecreasing on R and 
Var K Sc < ~;y determines K and conversely. 
2° &(>5 Xnk)  &(X) with chf. necessarily of the form e” if, and 
k 


only if, Kn — K where Kn are defined by 
Kyle) = Df 9? dP 
kv —wo 


Tf Sony? Sc < © is replaced by XY ony? — 0° X < ~, then Ky 3 K 
k k 
is to be replaced by Ky, — K. 


Proof. 1° follows from b, the comparison lemma and the convergence 
lemma. 

2° follows from 1° and the convergence lemma; and the particular 
case follows from the fact that the assumption made becomes 


Var K, = YS on? — o7°X = Var K. 
P 


EXTENSION. So far the r.v.’s under consideration were all centered 
at expectations. If we suppress this condition and set 


Onk = EXnky Fin(x) = Fale + Ank)s Fnu(t) = eT Mank# (u), 


then the foregoing results continue to apply, provided F,, and fy, are 
replaced everywhere by Fz, and faz; and then we write P instead of y. 
Going back to the noncentered r.v.’s, we have to introduce limit laws 
&(X) with finite variances but not necessarily null expectations a = EX, 
whose log’s of ch.f.’s are of the form y(u) = tua + Y(u), so that 


The uniqueness lemma becomes: y determines a and K, and con- 
versely. 
WwW . Ww 
In the convergence lemma, K, — K is replaced by K, — K and 
an — a. 
The same is to be done in the limit theorem with a, = >> ay, and 
k 


Fx replaced by Faz- 
Thus, the convergence criterion A2° becomes 


EXTENDED CONVERGENCE CRITERION. Jf independent summands Xnj 
are such that max ony? — 0 and Sion? Sc < , then &(> Xn) 
k k k 
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&(X) with chf. necessarily of the form e” if, and only if, Ky > K and 
>, ank —> @ where 
k 


K(x) = f y? dF u(y + daz)» One = EXnp 
k ¥ —o 


If Sone Sc < © is replaced by Son? — 02X < ~, then K, > K 
P P 


1s to be replaced by Ky — K, 


Particular cases: 
1° NoRMAL CONVERGENCE. The normal law 97(0, 1) corresponds to 


2 
y(u) = - > and, hence, to K defined by K(x) = 0 or 1 according as 


x <0 or « >0 (because of the uniqueness lemma, it suffices to verify 
that this K gives the above y). 


NorRMAL CONVERGENCE CRITERION. Let the independent summands 
Xnky centered at expectations, be such that Yon,” = 1 for all n: 
k 


then &(>5 Xnz) 2 900, 1) and maxon,” — 0 if, and only if, for 
k | k 
every « > O, 
£n(e) = >» x aF — 0. 


hvlzlZe 


Proof. Since 


max on,” = max { +? dF ji(x) Sb + max [ x? dF, Se + gn (6), 
k |z|Ze 


k k 


it follows that gn(e) — O for every « > 0 implies (letting 2 — © and 
then e — O in the foregoing relation) max on,” — 0. Then, immediate 
k 


computations show that the convergence criterion A2° is equivalent to 


gn(e) — O for every «€ > 0. 
Xk 
Upon setting Xp, = —, k=1, ---, n, EX, =0, 5,7 = ¥ o7X, 
Sn P 
we obtain the classical normal convergence criterion. Liapounov’s 


theorem follows from 


] 
f x aFy, < “3.6 f x a+ aF,,. 
| 2 |2 e8n E Sn 


2° Potsson CONVERGENCE. The Poisson law @(A) corresponds to 
y(u) = tury + A(e“ — 1 — tu) = tua + Y(u) and, hence, the function 
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K which corresponds to wy is defined by K(x) =O or X according as 
*«<1lor*>41. The extended convergence criterion yields, by im- 
mediate transformations, the following 


PoISSON CONVERGENCE CRITERION. If the independent summands Xnx 
are such that max ony” — 0 and > on” — d, then &(>, Xnz) > P(A) 
k k 


if, and only tf, >) EXnz — and, for every « > 0, 
P 


=f x dF ur(x + EXnz) — 0. 
kv¥|2—-1 (Ze 


*$ 23. SOLUTION OF THE CENTRAL LIMIT PROBLEM 


We consider now the general problem. As was pointed out, the 
method of attack will be essentially the same as in the case of bounded 
variances. The computational difficulties will arise from two facts. 
(1) Even existence of first moments is not assumed, and the center- 
ings, instead of being at expectations, will have to be at truncated 
expectations. (2) The functions K defined previously are not necessar- 
ily of bounded variation and, even when they are, they are not assumed 
to be of uniformly bounded variation. They will have to be replaced 


x 2 
by functions of the form ¥,,(~) = >> f > dF, where Fy, will be 
kJ—o 


d.f.’s of the summands centered at truncated expectations. This will 
lead to limit laws with log ch.f.’s of a more complicated form, which 
we investigate first. 

23.1 A family of limit laws; the infinitely decomposable laws. A 
law & and its ch.f. f are said to be infinitely decomposable (i.d.) if, for 
every integer 7, there exist (on some pr. space) 7 independent and identi- 


cally distributed r.v.’s Xnx, such that & = £( >) Xnz)3 1n other words, 
k=1 


for every 7 there exists a ch.f. f, such that f=/,”. If f #0, then 
log f exists and is finite and f, = ¢“/” !°®/; unless otherwise stated, we 
select for log of a ch.f. its principal branch(vanishing at u = 0) and for 
the wth root of f we take the function defined by the preceding equality. 

Clearly, if a law is i.d., so is its type. The degenerate, normal, and 


2 


Poisson type are i.d., since if log f(u) = tua or tua —o OF tua + 


r(e*® — 1), then * log f(u) has the same form whatever be 7. More 
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generally, the limit laws e” obtained in the case of bounded variances 
are 1.d., since the corresponding functions w are such that y/z is log of 
a ch.f. of the same form (with a/n € R and K/n d.f. up to a multipli- 
cative constant). In fact 


a. The t.d. family belongs to that of limit laws of the Central Limit 
Problem. 


For, on the one hand, the uan condition for independent and identically 
distributed r.v.’s Xn, which figure in the definition of i.d. laws becomes 
convergence of their common law to the degenerate at 0, that is, f, — 1; 
on the other hand, 


b. If, for every n, f = fn” where fn ts a chf., then fy — 1; and, more- 
over, f # 0. 


Proof. Since |f| <1, we have | f, |? = |f|?/" — ¢g with g(v) = 0 
or 1 according as f(u) = 0 or f(u) #0. Since f is continuous and 
f(0) = 1, there exists a neighborhood of the origin where | f(z) | > 0 
and, hence, g(z) = 1, so that g is continuous in this neighborhood. 
Thus, the sequence | fn |? of ch.f.’s converges to a function g continuous 
at the origin, the continuity theorem applies, and g is a ch.f. Therefore, 
g is continuous on R with g(0) = 1 and, since it takes at most two values 
QO and 1, it reduces to 1. Consequently, f #0, logf exists and 


1 
e ° —] af ° e e 
is finite, and f, =e” —» 1. The proposition is proved. 


We shall see later that the family of limit laws of the problem coin- 
cides with the i.d. one. This explains the property below. 


A. CLOSURE THEOREM. The i.d. family is closed under compositions 
and passages to the limit. 


Proof. If fand_//’ are i.d. ch.f.’s, then, for every 7, there exist ch.f.’s 
fn and f'n, such that f=/f,", f’ = f'n", so that ff’ = (frf'n)” where 
Fn f'n are ch.f.’s, and the first assertion is proved. 

On the other hand, if a sequence /, of i.d. ch.f.’s converges to a ch.f. 


2 
f, then, for every integer m, | f, |" — |f|™ and, by the continuity 
2 


theorem, | f |” is a ch.f. Therefore, | f|? is an id. ch.f. and, hence, by 
b, f #0. Since log f exists and is finite, and 


1 1 1 
— — log f — log f — 
Fn™ = em ” — em = fm, 


it follows that f'/” is a ch.f., so that f is i.d. This concludes the proof. 
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The basic feature of 1.d. laws (hence, as we shall see, of all the limit 
laws of the Central Limit Problem) is that they are constructed by 
means of Poisson type laws. This is made precise in the theorem below 
and explicited by the representation theorem which will follow. 


B. STRUCTURE THEOREM. J chf. ts t.d. if, and only if, it ts the limit 
of sequences of products of Poisson type chf.’s. 


In other words, the class of i.d. laws coincides with the limit laws of 
sequences of sums of independent Poisson type r.v.’s. 

Proof. Products f, of Poisson type ch.f.’s are defined by finite sums 
of the form 


log f(z) = » {1UAnk + Ank (erent ~ 1)}, nk = QO, 
k 


, 1 . 
so that the functions - log f, are log of ch.f.’s (of the same kind) what- 


ever be the fixed integer m and the f, are i.d.ch.f.’s. Thus, by A, if 
f, — f ch.f., then f is i.d. This proves the “‘if’’ assertion. 
Conversely, if f is i.d., then log f exists and is finite and 


1 i 
n(fe — 1) — logs, fru) — 1 = f (ei = 1) dFa(@) 


where F, are d.f.’s. By taking Riemann-Stieltjes sums which approx- 
imate f'/"(u) — 1 by less than 1/n?, the “only if” assertion follows, 
and the proof is terminated. 


In what precedes, (uz) = {(e" — 1)ndF,,(«) — log f(u) and yp Is it- 


self log of an i.d.ch.f. Since Var (”F,) = » — ©, brutal interchange 
of integration and passage to the limit is excluded. However, the in- 
tegral inequality in 13.4 yields Var V, Sc < © with d¥,(x) = (x?/1 + 
x*)ndF n(x) so that the weak compactness theorem applies. But the 
integrand for d¥,(«) is undetermined at x = 0, and we have to modify 
it. This leads to the y-functions below: 

Unless otherwise stated, y, with or without affixes, will denote a func- 
tion defined on R by 


V(u) = tuc +{(e — | 


° 1 2 
1UX ) +My M(x) 


 y + x2] x 
where a € R and W denoting a d.f.—up to a multiplicative constant, 
with &(—o) = 0; the corresponding y, a, VY will have same affixes if 
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any. The value of the integrand at « = 0, defined by continuity, is 
2 
—u"/2. 


c. Every eY is an id. ch. 


Proof. We use repeatedly the fact that the class of log’s of ch.f.’s is 
closed under additions. 

The integrand is bounded in « and continuous in uw (or «) for every fixed 
x (or z). It follows that the integral is continuous in w and is limit of 
Riemann-Stieltjes sums of the form 


dD {iuan, + nz(e™"* — 1)} 
k 


where 


I + Xn nk 
Ank = ——3 Wt nty Mn ki), Fnk = — 


hak) Bak = Hint} 
Xnk 1 + Xnx 


we can and do take all x,, #0. Since every nonvanishing summand 
is log of a (Poisson type) ch.f., the sums are log’s of ch.f.’s, and so is the 
integral according to the continuity theorem. Since iva is log of a 
ch.f., so is y and, hence, so is every ¥/n corresponding to a/n € R and 
v/n—d.f. up to a multiplicative constant. The assertion is proved. 


Remark. If f x*d¥(x) < oo, then 


, 1 
W(u) = tua + [ce — 1 — tux) = dK(x) 
where * 


a=a +> dV(x) € R, dK(x) = (1 + x”) d¥(x), 


and the i.d. ch.f. e” has for first moment a and for variance Var K < © 
(take the first two derivatives at u = 0). Conversely, if an i.d. ch/f. 


e’ has second (hence first) finite moment, then f x’d¥ (x) < o (take 


the second symmetric derivative at u = 0). Thus, the family of all 
limit laws in the case of bounded variances coincides with the sub- 
family of i.d. laws with finite second moments. 


We establish now two properties of functions » corresponding to the 
unicity and continuity theorems for ch.f.’s. They will be reduced to 
these theorems by making correspond to functions y functions » and 
®, with same affixes as y if any. We define yg on R by 


1 h —h 
g(u) = y(u) — J wee an 
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We have, upon replacing y by its defining relation and interchanging 
the integrations, 


1 1 + x? 
p() -{ | fewa — cos hx) 5 au} ah = fe a® 
0 vw 


x . 1 2 
B(x) = i) (1 _ ) r? ay. 
—o J y 


ry 1 2 
o<¢s(1-—*) ule <¢'’ <0 


x x? 


with 


Since 


where c’ and c” are independent of « € R, it follows that ® is non- 
decreasing on R with 


c VarW Ss Var®<c’ VarvV < o 
* sin y\ 1 + y? 
U(x) -{ aw /(1- *) _- 
—o J y 
C. Uniciry THEOREM. There is a one-to-one correspondence between 
functions p and couples (a, Y). 


and 


For this reason we shall sometimes write ¥ = (a, W). 

Proof. By definition, every couple (a, Y) determines a function y. 
Conversely, if w is given, then, by the foregoing considerations, w de- 
termines a function g which is a ch.f. (up to a constant factor). By 
the inversion formula for ch.f.’s, g determines ® and, in its turn, ® de- 
termines WV; furthermore, y and WY determine a, which completes the 
proof. 


Cc 
D. CONVERGENCE THEOREM. If an — aand WV, — VW, then, — W. 


Conversely, if Un —> g continuous at the origin, then a, — a and Vy, > y 
such that g = py = (a, W). 


Proof. The first assertion follows at once by the Helly-Bray theorem. 
As for the converse, since the sequence e” of i.d. ch.f.’s converges to 
e® continuous at the origin, this convergence is uniform in every finite 
interval and, by 23.1b and A, e® is an i.d. ch.f. with e® #0. Hence, 
g is finite and continuous on R, the sequence w, converges to g uni- 
formly on every finite interval, and 


} h —h 
vale) = ee) — f LED AED y 
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continuous on R. In particular, 


Var 6, = 9,(0) > f 


0 


1 o(f —h 
gh) + g( ) th < ©, 


so that variations of the ®, are uniformly bounded. Thus, the con- 
tinuity theorem applies to the sequence g,, and there exists a nonde- 
creasing function ® of bounded variation on R such that, upon applying 
the Helly-Bray theorem, at every continuity point « of ® as well as 


for x = +0, 
x . 1 2 
Ua(x) -{ av, / (1 _ 2) ry 
—0 J J 


x . 1 2 
> U(x) -| av/(1 _ =) r? 
- yl y 


c 
Hence, Y, — W and, by the same theorem, 


cuz tux \1+ x 
LUan = Wr(u) -{ e —l TB —,— @¥n, 


x 
cuz ux \1+ x? 
— gu) - | (¢ -1- 5) 
= 1Ua 


This terminates the proof. 
E. REPRESENTATION THEOREM. The family of i.d. chf.s coincides 
with the family of ch.f.’s of the form e*. 


Proof. According to 23.1c, every e” is an i.d. ch.f. Conversely if, 
for every 2, f =f,” where f, is a ch.f. corresponding to a d.f. F,, then, 
upon applying the preceding convergence theorem, we obtain 


log f(w) = lim n( f(z) — 1) = limn(f,(u) — 1) = lim f (= — 1)n dF, 


nx 
= lim @ J _U% oR. 
1 + x? 


+f( ius 4 Lux ji x? ar) 
° 1+x7/ x? Ite " 


= lim Yn = some y, 
with 
x? c 
aV,(x«) = 1 ge ha) and WW, — W. 


The theorem is proved. 
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23.2 The uan condition. The main computational difficulties arise 
in connection with the uan condition, and we have to investigate it in 
detail. We recall that given a sequence of sums >) Xnz of independent 

k 


r.v.’s, the uan condition is that, for every e > 0, 


max P\| Xnx | = el = max [ dF, — O. 
k |z|Zze 


k 


a. The uan condition implies that 


max|pXnz|— 0, max | | | x |" aFnr 2» 0, r>O0, 7 > O finite. 
k k Jla|<r 


Proof. The medians of a r.v. belong to any interval such that the 
pr. for the r.v. to be in the interval is greater than 1/2. Since under 
the uan condition min P| Xnk | < «] > 1/2 whatever be e > 0, pro- 

k 


vided ~ = n, sufficiently large, it follows that max | uXnke| < for 
k 


n = n,, and the first assertion is proved. 
Under the same condition, by letting 7 — © and then e — 0, we 
have 


IIA 


k 


max [ |x|’ dFu, Se + max [ | «© |" dFnx 
| 2] <r eS|z|<r 


k 


IIA 


e + 7” max [ aF un, — 9, 
[2|Ze 


k 
and the second assertion is proved. 


A. UAN cRITERIA. The uan condition ts equivalent to 


x2 
max | —— dF — 0 or max | far — 1] — 0 


uniformly on every finite interval. 


Proof. Under the uan condition, by letting 2 — © and then e — 0, 


we have 
2 


“ks e dF, 0 
é Tp ge ink S + max f kt — 


and,for|u|S4< 0, 
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max | fnz(u) — | | 


max 
k 


IA 


| 
f (e°%* — 1) dFn,| + max if (e°"* — 1) dF ni 
|x| <e k |z|Ze 


be + 2 max dF 44 — O. 


k J|z|Ze 


IA 


2 
x 
Conversely, if max f ———, adF ,, — 0, then, for every « > 0, 
k 1+ 


2 


1+ ¢ x 
max | dF yy &S > max —— 5 4F nk — 0, 
k Jl|z|ze € kt Jizlzel+n. 


and the uan condition holds. 
Since, upon replacing f,,(u) by f e'“* JF, and interchanging the in- 
tegrations, we have 


x” ora) 
max [ 5 dP = max | e~"“(1 — ®fnze(u)) du 


k 
<{ ee“ max | fax (22) — 1 | au, 
0 k 


it follows, by the dominated convergence theorem, that max | fax —] | 
k 
— 0 implies the uan condition, and the proof is complete. 
From now on, we fix a finite + > 0 and, for every af. F, with or without 


affixes, we Set 


a ={ xdF, F(x) = F(x+ a), f(x) = | ew dF 


z|<r 
with same affixes if any. 


We observe that | @| < 7 and that the “bar” does not mean “‘complex- 
conjugate.” 


Corotiary 1. Under the uan condition, max| fax —1| — 0 uni- 
k 
formly on every finite interval. 
Since, by a, max | Ank | < max f | x | dF ny —> 0, the r.v.’s Xng = 
k k [a|<r 


Xnk — 4nz obey the uan condition, and the assertion follows by A. 
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Corotiary 2. Under the uan condition, given b < «, all log fni,(u) 
exist and are finite for | u | S bandn 2 np sufficiently large, and 
log faa(@) = fne(u) — 1+ Onk| fnk (4) — 1 , | Ons. | S 1;: 
similarly for the fnz(). 
This follows from A and logz = (z — 1)+/6|z— 1]? for|z—1]|< . . 


From now on, if > 0 is given, then we take 7 = mz, so that the 
foregoing relations hold. 


We are now in a position to establish the inequalities which will lead 
almost at once to the solution of the Central Limit Problem. 


B. CENTRAL INEQUALITIES. Under the uan condition, for n = ny 
sufficiently large, there exist two finite positive constants c, = ¢,(b, r) and 
Co = €2(4, r) such that 

2 
x 


ey max | far(4) —1| <|- n 


b 
dF nz S 62 f | log | fax(z) | | du. 
0 


«2 


The inequalities follow at once, upon applying a, from two inequalities, 
valid for arbitrary r.v.’s, that we establish now. We shall use repeatedly 
the two relations 


[s@ eRe +o = f g(x — c) dF (x), 


{ («— a) dF = a—af aF = af aF., 
|a|<r |2|<r |z|2r 


B,. LowErR sounb. There exists a finite positive number cy = c,(a, b, r) 
such that 


and 


a 


Cy max | f(u) — I] | <|{ 
Proof. Since, for | u| $b < «, 
| #4) — 1 


=| fee - nar] s2 ar +6] f (© — a) dF | 
[eler [2] <r 


5? 
+ ry (x — a)" ar 


| 2|<r 


5? 
= (2+ |4|d[ dF + — [ (x — a)* dF 
|2|2Zr 2 J|a\<r 
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where 
2 _ ,\2 
f ap gt CHNe" _@-% 
|z|2r (r—|a|)? Jiziz-1+ (« — 2) 
and 


(x — a)? 


z|<r]l + (x _ a)” 


3 


Jf @- oar stit+e+ | aba 


it follows that 


x2 


_ 1 
Fw) - 1) s— | a 
where | : 
1 2+ 1a\b 6? 
—_ = T 2 —__—_—_—__-_——- —?) 
- + @ + 2b4 [ee | 


and the asserted inequality is proved. 
Under the uan assertion, for 7” sufficiently large, we have, accord- 


. T 
ing to a, | a | < >? and we can take for c,; = ¢,(4, 7) the value of c, 


obtained upon replacing | a2| by 5° This proves the left-hand side 


central inequality. 


Bo. UprerR BounD. Fort > | be , uw a median of F, there exists a finite 
positive number co = c(u, b, r) such that 


x _ b 
Jae seol (1 — | f(z) |?) du. 


If f(u) # 0 for | u | < b,then1 — | f(u) |? can be replaced by 2| log | f(z) | |. 
Proof. On account of the elementary inequality 


1—|f/? S — log|/|? = 2] log | £| |, 


the second assertion follows from the first one. To prove the first as- 
sertion, we shall use the symmetrization method and denote by F'’ the 
d.f. of the symmetrized r.v. X — X’ where X and X’ are independent 
and identically distributed, so that the corresponding ch.f. f° = | f|?. 
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From the elementary inequality 
sin bx\ 1 + x? 
1— =c(b)>0, « CR, 
bx x 


and the relation (obtained upon interchanging the integrations) 


fa _ | f(z) |) du -f{foa — cos ux) du} ak 
0 


sin dx\ 1 + x x 
“fe a 
x 1+ x? 
it follows that 


b 
(1) J (1 — | fw) | 


We pass now from F* to F*, the d.f. of X — u, and set 
g() = P| X-zvl 24, ¢O=Pl X* | 24, +€ 0, +4), 


so that, upon applying the weak symmetrization lemma (which says 
that g* < 29°) and integrating by parts, we obtain 


2) [Ss a” -[ ae - [ rod) 
[endts) afte 


Now, we pass from F* to F. From the elementary inequality 


(x — a)? S (x — w)* + 2(u — a(x — a), 
it follows that 


J @- oars] G—wrart 2(7 + HDIf, @- oar 


IA 


<{ — p)? dF + 2r(7 + | “Df. dF 


and, hence, 


f x2 Fu (x — a)” IF <{ \2 dF dF 
1+ x? 7 1+ (* — a) ~ alee ¢ +t [2lzr 


sf wat tot labi fa 
Jal <r 


[2j2r 
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Since 


ug ; (x — p)? 
[@ wars 1te+ lab cize pe” 


2 x 
< {I f afr 
S{1+(r7+ ]2/)?} tax 
and ; ; 
1 _ 
f a +e ted" (ww? 
|z|2r (r—| pl)? Sizizr1 + (« — p)? 
. ~ (r—| pl)? 1+ x? 
it follows that 
xe x? 
3 ak < f ake 
@) I =¢ 1+ x? 


where 


1 Qr(r 
= lyr) = (+r t+ | a)’y {14 At Aes Te), 


(7 — |»)? 
Together, the inequalities (1), (2), and (3) yield the inequality 


2 _ b 
Je @ sof (1 — | f(w) |?) du 


/ 


2 
with ¢. = — , and the proof is concluded. 
bc(d) 


; rT 
Under the uan condition, for x 2 n, sufficiently large, | »| < 5 and 
we can take for cy = ¢2(d, u, 7) the value of co obtained upon replacing 
T 
|u| by 5° This proves the right-hand side central inequality. 


23.3 Central Limit Theorem. We are ready for the solution of the 
Central Limit Problem and can follow the same approach as in the case 
of bounded variances, since 


a. BoUNDEDNESS LEMMA. Under the uan condition, if TI | fuk | — || 
ik 


continuous, then there exists a finite constant c > 0 such that 


we 
dF’, Ss ; < e 
> 1+ x? hae 
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Proof. It suffices to prove the assertion for ” sufficiently large so 
that, by 23.2A, Corollary 2, all log f,, exist and are finite. Let >0 
be sufficiently small so that, for |u| S 4, | f(z) | > 0, and log | f(x) | 


exists and is finite. Since | f|? is a ch.f., > log| faz | — log|/| uni- 
k 
formly on [—4, +4], and, by the right-hand side central inequality, 


we b 
=f rp ees — af tog | fau(a) | a 


b 
— ~e2[ log | f(z) | du < , 
0 


The assertion follows. 


b. CoMPARISON LEMMA. Under the uan condition, if there exists a con- 
stant c such that whatever ben 


x? 7 
b [peda sexe, 


then _ _ 
2 {log fnz(@) — (fnz(z) -— 1)} ~ 0, 4ER. 


Proof. By 23.2A, Corollaries 1 and 2, max|fa,—1| — O and, 
k 
given b > 0, for | u | < 4 and x sufficiently large, 


log fnk = Fak — | + Onk | Fak — 1 [?, | Ont | Ss 1. 


By the left-hand side central inequality 


Eljaw -1| s+ Df aa s 6 < 
—~1/<— nk = <<. 
k Fux (2) - C1 &k 1 + x? ' Cy 
It follows that by taking 4 > | w|, where uw € R is arbitrarily fixed. 
|X {log Fue(#) — (Fne(#) - 1)} | SX Faw) — 17? 
k 


c - 
< —max|fnr(u) —1| > 0, 
Cy Ok 
and the theorem is proved. 


Since (omitting the subscripts) 


log flu) — (Flu) — 1) = log flu) — {ina + f (e* = 1) dF) 


[SEc. 23] CENTRAL LIMIT PROBLEM 321 


and 


[ew - 0 a = in {a 
1+-ry 


+f( uz 4 Lux jit x? JP 
ee _ | — —___ ] ____. . ___ gf 
L+x7J x? 1+ x 


the sums which figure in the comparison lemma are 


log I fn) ~ Wn(z) 


where 
cus tux \1+ x? 
Wn(u) = itary +f(- —-l1- 7 =) ane) 
with 
= Elam +f aFa}, aval) = © Y Pul 
On © ank 14+ nk$9» me 7 nk(*). 


A. CENTRAL LIMIT THEOREM. Let Xnz be uan independent summands. 
1° The family of limit laws of sequences &(>. Xn) coincides with the 
k 


family of t.d. laws or, equivalently, with the family of laws with log of ch. 
Y = (a, W) defined by 


tux \1+ x? 
——_— dV 
1+ 3) x *) 


wherea € R,andWVisadf. up to a multiplicative constant. 
2° £(d) Xnz) > L(X) with log of chf. necessarily of the form 
. } 
y= (a, Vv) if, and only if, 


c 
VW, VY, an > a, 


Y(u) = iua +{(e —l— 


where 


x _ x y? _ 
n= n ———, 4F, » Wn = f —— dF, 
7 © an +f x} (x) ~J Ta, k 
an 


ank -{ le * OF nk Fin(*) = Pan (x + Aank)s 


with r > O finite and arbitrarily fixed. 


Proof. Every i.d. law is a limit law of the Central Limit Problem. 
Conversely, if, under the uan condition, I] fnx — / ch.f., then, on 
i 
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account of the boundedness lemma, the comparison lemma applies and 
e*" —» f. Thus, on the one hand, by the closure theorem for i.d. laws 
f = e” is id. and 1° is proved. On the other hand, ¥, — w and hence, 
by the convergence theorem for i.d. laws, v,— W, an — a, and the 
“only if” part of 2° is proved. . 
Conversely, if a, — a and Vv, — YW, so that 
2 
Var V,, = >| ae — Varwv < 


and the comparison lemma applies, then ¥, — W hence [I fnzr — e”, 
ke 


and the “if” part of 2° is proved. This terminates the proof. 
Extension. It may happen that under the uan condition, the sequence 
£(>) Xnz) does not converge, yet the sequence £(>> Xnz — @n) con- 
P 


verges for suitably chosen constants a@,; this is the situation in the 
Bernoulli case and, more generally, in the classical limit problem where 


Xnk = Xz/bn with b, =n or Sy. Then J] faz(u) is replaced by 
k 
oe tuan 


I] fnz(%) and the boundedness lemma can still be used, since it 
k 


refers only to the moduli of products. On the other hand, the sums in 
the comparison lemma can be written log {e~”’"]] fnz(u)} — 
i 


{—iua, + n(u)}. Since —iua, + n(x) is still a y-function, the Cen- 
tral Limit theorem remains valid, provided a, is replaced by an — Gn, 
and the theorem can be stated as follows: 


B. EXTENDED CENTRAL LIMIT THEOREM. Let Xnz be uan independent 
summands. 
1° The family of limit laws of sequences &(>, Xnk — an) coincides 
k 


with the family of t.d. laws. 
2° There exist constants an, such that the sequence &(. Xnk — an) 


e e c 
converges tf, and only if, Vn — some V, where 


x y? 7 
Ua(x) = I ert: ks 


Then all admissible ay are of the form an = a, — a+ o(1) where a 
is an arbitrary finite number and an = >. | an + f ns dF | , and 
k 
all possible limit laws have for log of ch. = (a, ¥). 
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23.4 Central convergence criterion. The convergence criterion 22.3A 
2° is expressed in terms of expressions twice removed from the primary 
datum—the d.f.’s of the summands, and the probabilistic meaning of 
these expressions 1s somewhat hidden. We transform it by unpleasant 
but elementary computations as follows: 


A. CENTRAL CONVERGENCE CRITERION. Jf Xn, are uan independent 
summands, then 


IL fn ~ f= e”, y= (a, VY), 
k 
if, and only if, 


(1) at every continuity point x * 0 of V 


2, Pai(e) ~f = dy for « <0, 


2 {1 Par(x)} ap aa for *>0O 


(i) asn — © and then e > 0 


e[f, cee (fats) = 000-20 


(iit) for a fixed r > 0 such that +r are continuity points of V 


1 
>» x dP nk > at x day -{ — ay, 
| 


|a|<r J2|<r z|=rX 


The iterated limit in (11) is the generalized iterated limit lim lim,. 


e— 0 
Proof. We have to prove that the three stated conditions are equiva- 


lent to 
2 


(C) VY, WY with d¥,(x) =—— Yo dFu 
wk 
and 
(C’) 2 { ank +a 5 aF nx} — a, with ank ={ x AF aks 
z\|<r 


Fian(*) = Far + Ank). 


324 CENTRAL LIMIT PROBLEM [SEc. 23] 


1° Let x be continuity points of VW. It is readily seen that condition 
(C) can be written as follows: 


Vn(x) > W(x) for «<0, 
¥n(+0) — ¥n(*) > (+0) — ¥(x%) for *>0 
and, as x — © and then e — Q, 
Vn(+e) — Un(—€) > ¥(+0) — ¥(—0). 


It follows, upon replacing W,, by its defining expression and applying 
the Helly-Bray theorem, that (C) is equivalent to 


(C,) D Fale) > {= Le he, ees 


= 1 
> {1 — Far(*)} =f 1s avy for x>0 
k x J 


and 
x2 


C [ dF. w(-+0 v(—0 
(C2) pape Fe > HHO) — ¥(-0) 
as m—+>o andthen e — 0. 


Let a, = max [ | | x | dF, so that | ank | <Sa4,—> 0. Since 
k |x| <r 
2. Pugs i. Gn) S&S = 2 Fix(x) S 2. Pans + An), 


and the continuity points « of WY are continuity points of the integrals 
in (C;), it follows at once that the first parts of (i) and (C;) are equiva- 
lent; similarly for the second parts. Thus (C;) is equivalent to (i). 

2° Since 


1 x 
ae dF 5dF, = 2 UF. 
are a S22 |al<el + x? - 2 ae : 


condition (C2) is equivalent to 


> x’ dF, 3 V(+0) —¥(—0) as n—-o andthen ¢€ > 0. 
k 


| 2|<e 


—_—— 
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But, on account of (i), as 2 — © and then e — QO, 


=f x? dP — Df (x — ang)? dF nx, | 
k ¥|2|<e kvY|a|<e 


»( f + J ) (= au)! dFim SOE [any da 0 


k 
|x| >e |z|<e 


€_. 
|z—a,,|<¢ |z—ayy |e 3 = le|S2¢ 


and, since 4, — 0, we have, for e < 7, 


2 
>> (x — Gnz)? dFak — > ‘f x? dF ak — (f « dF) | 
k Vil zl<e k }2|<e |x| <e 


2 
= =(f « dF) — E anu? | dF yx, | 
k eS|az|<r k [z|2e 


S (tan + 4,7) =f aF ny, — 0. 
kvV|z|lBe 


Therefore, under (1) or its equivalent (C;), condition (C2) is equivalent 
to (ii). Thus, condition (C) is equivalent to (i) and (ii). 

3° It remains to prove that, under (C) or its equivalent (i) and (ii), 
condition (C’) is equivalent to (iii). Since 


x 
—_— dF, 
EJ; +. x? " 


= dF, —__—— dF, =f ——, dF, 
a lel<r. an 2 lzi<r1 +x 1+ x RTD zr 1 + x * 


and, -t7 being continuity points of WY, we have, by the Helly-Bray 
theorem, 


=f OF f av av 
>? nk >= x n> x 
EYial<r 1 + x? |a|<r la|<r 


=f * uF f lw av 
9 nk = ~ n —? ~ .) 
CMielzrL tee Sele |zl2rx 


it suffices to prove that >> x dF... — 0. This assertion follows 
k 


|a|<r 
from the fact that a4, — O and -t7, being continuity points of WY, are 
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continuity points of integrals in (1), so that, by (1), 


> x ag (x ~ Ank) aP ny 
|2|<r [|x| <r 
a ell (% — Anz) dF nx 
k la—ank| <r 
-{ (x — ny) dF,| 
lal<r j 
<a, >) dPuz + (7 + 4n) dF ny, — 0. 


lxlZr rSirl<rtan 
This terminates the proof. 


Remark 1. In the course of the proof, it was found that condition 
(i) can be written with Fy, instead of F,, and condition (ii) is equiva- 
lent to 


(i1’) oe x” dF, —> V(+0) — ¥(—0) 
lal<e 


as n— oo andthen e — 0. 


Remark 2. In conditions (11) or (ii’), the passages to the limit can 
be taken indifferently to be lim lim sup or lim nen: inf, instead of 
— 0 


n e— 0 


the generalized iterated limit; we leave the verification the reader. 


Upon using the extended Central Limit theorem, the central con- 
vergence criterion extends at once to sums with variable origin, as 
follows: 


B. EXTENDED CENTRAL CONVERGENCE CRITERION. If Xn are uan in- 
dependent summands, then there extst constants ay, such that e~'* TT fy (u) 
k 


— e) where y = (a, V) if, and only if, conditions (i) and (ii) of the 
central convergence criterion hold. Then the admissible ay, are of the form 


1 
=> edFu — af vdv + [ —dav + o(1) 
kJla2l<r J2|<r J2l27rKX 
where --t are fixed continuity points of WV. 
This criterion implies properties of min X,, and max Xpx. In fact, 
k k 


it takes then a more intuitive form, as follows: 
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C. EXTREMA CRITERION. Let Xn be uan independent summands, and 
let Xnzy§ = Xnz or O according as | Xnk | <eor | Xnr | > €, 
The sequence &(), Xnk — 4n) converges for suitable constants an if, 
k 


and only if, the sequences &(min Xnz), &(max Xnz) and Yo? Xyx6 con- 
k k k 


verge asn — © and then e — QO. 
More precisely, 205 Xnk — Gn) — L(X) with &(X) necessarily an 
k 


i.d. law (a, V) if, and only if, asn —> «© and then e — 0, 


> 07? Xn > V(+0) — ¥(—0) 


and 
&(min Xpz) > L(Y), L(max Xn) — L(Z) 
k k 
with 
Fy(x) = 1 —e 4 or 1 and Fz(x) = 0 or e&™, according as x <0 
or x > 0, 
where 


2 


x 1 2 ore) 1 
L(x) = f + du(y), x <0; L(x) = — { r? d¥(y), x > 0. 
y x J 


—O 


kn 
Proof. Let G, be the d.f. of min Xnz, so that 1 — G, = JJ (1 — Fax). 


k Skn k=1 
For every fixed x > 0, Fnzg(x) — 1 uniformly in & and, hence, G,(*) > 
1. For every fixed « <.0, Finx(*) — O uniformly in k and, hence, for 
n sufficiently large, 


log (1 — G,(*)) = 2 log (1 — Fae(*)) = — (1 + o(1)) 20 Fae (). 
k 


Therefore, the assertion relative to Fy is equivalent to the first part of 
condition (1) of the central convergence criterion; similarly for the 
assertion relative to fz. The theorem follows. 

23.5 Normal, Poisson, and degenerate convergence. We apply now 
the central convergence criterion to the three first-discovered limit 
types. We set 


2 
Ank(t) = f x dF aky One’ (7) = f x° dF, — ( f « dF) 
|2|<r |al<r l2l<r 


2 
1° A normal law 9(a, o”) corresponds to ¥(u) = iva — > u*, that 


is, YW = (a, YW) where U(x) = 0 or o® according as x < Oor x > 0. 
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NORMAL CONVERGENCE CRITERION. Jf Xnx, are independent summands, 
then, for every e > O, 


oD Xnk) —> Wa, o*) and max P| Xnk | =>e«—0 


if, and only if, for every ¢ > Qandar> 0, 
(i) > Pll Xn | = > 0 
k 


(11) > onze? (r) —_ a”, >> An(7) — a. 
k k 


Proof. We have, under (1), 
max P| Xnx | 2es > Pi Xnk | = €] — 0. 
k k 


Furthermore, always under (i), if ¢ < 7, then 


TS on2(t) — Nout) |<S¥ i) dF tue 
k k kh veS|al|<r 


k 


eS|z|<r 


s 37? >» aF nk — 0 


eS|z|<r 


and the same is true of e > 7; it suffices to interchange ¢ and 7 in the 
foregoing chain of inequalities. Upon taking into account these conse- 
quences of (i), the foregoing criterion follows from the central con- 
vergence criterion applied to the limit law 9(a, o”). 


Corotiary. If Xn~ are independent summands and the sequence 
£m Xnk) converges, then the limit ia 1s normal and the uan condition 


is satisfied if, and only if, max | Xnk | =, 0. 


Upon setting pax = P| Xnk | = e], it suffices to observe that, because 
of the independence of the summands, 


Plmax | Xne|Zqd=1— I (1 — Dak)» 


For, upon applying the elementary inequality 
1 — exp [— D Prk] <1- II (1 — Pnk) Ss Ds Pnks 
k k k 


it follows that the asserted condition is equivalent to condition (i) of 
the above criterion. 
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2° The Poisson law @(d) corresponds to ¥(u) = A(e™ — 1) and, 
consequently, to y = € ; v) with W(x) = 0 or ; according as « < 1 
or x > 1. Upon applying the central convergence criterion and observ- 


ing that the condition relative to the onz7(e) reduces exactly as in the 
normal case, we obtain the 


PoISSON CONVERGENCE CRITERION. Jf Xnz are uan independent sum- 
mands, then &(>> Xnz) — @(A) if, and only if, for every « € (0, 1) and 
k 


az€ (0, 1), 


(1) =f aFnr >~90 and >, aF nn —> 
k¢v¥|zlze|2—-1|Ze k Y¥l|2z-1|<e 
(ii) > onke(r) - 0 and > an(r) — 0. 
k k 


3° The degenerate law £(0) can be considered as a degenerate nor- 
mal 9%(0, 0) so that the normal convergence criterion reduces to the 


DEGENERATE CONVERGENCE CRITERION. If Xn, are independent sum- 
mands, then &(>) Xnz) — &(0) and the uan condition is satisfied if, and 
k 


only tf, for every e > OQ andar>QO 


k z|Ze 
(11) > onk’ (7) nd 0, » Any(7) — 0. 
k k 


Corotiary 1. Jf X_ are independent summands and byt, then 


£ (3) — 0 if, and only if, for every e > 0 


(i) =f IF, 0 
kvY|zl2ebn 


1 2 
a= dre—({ «dFs) | > 0, 
bn” k | z|<bn |2|<bn 


(11) 
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Because of the above criterion, taking 7 = 1 and observing that for 


Xx _ 
‘ink = 5, by 
£(0) implies the uan condition. This follows from the fact that 


P = <e|>1— 6, for m 2 n, sufficiently large, implies that, for 


P < 26| 
Sn—1 


n> Nz) 
P| < «|| <<|2 1-2. 
| On—1 


Remark. For the degenerate convergence criterion, (i1) and (1) with 
€ = 7 imply that £()) Xnx) — £(0). For, as in 21.2A, by Tchebichev 
k 


inequality, (ii) implies that £(>> Xnz") — £(0) and then, by 21.1b, 
k 
(i) implies that £0) Xnz) — L(0). 
k 


» Finz (x) = F (b,x), it remains only to prove that £ — 


Xn 


aly 


bn—1 Sn—1 
bn bn—-1 


< 2¢| 


Sn 
nm bn 


Sn 


bn 


IV 


In particular, in Corollary 1, we may take e = 1. Thus, for 6, = ”, we 
have 


Sn 


Coro.iary 2. If X, are independent summands, then & ( ) — £(0) 
n 
if, and only if, 
(1) uo dF, — 0, 


kY¥lzlen 


Pel fat hare) > 
) pe | Jet “ Je a °» 


] 
(111) -> x dF; — 0. 

nN KEY Izl<n 
This is the classical degenerate convergence criterion. 
The reader is invited to specialize 23.4C to the three foregoing cases. 
In particular, it implies the corollary to the normal convergence criterion. 
As for the Poisson case, dL(x) = 0 or \ according as x ¥ l or x = 1 so 


that 
fey Xnk) — &(X), then £(X) = P(A) tf and only if £min(Xnz) —~> 


£(0) and &(max Xpx) — &(0, 1) with two values 0 and 1 only of pr. e~* 
k 


and 1 — e, respectively. 
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*§ 24. NORMED SUMS 


*24.1 Theproblem. Let “ — a, be normed sums with d.f. G, and 


ch.f. gn, where Sy = 2 X;, are consecutive sums of independent r.v.’s 
k= 


X, with d.f.’s F, and ch. f.’s f,, and where an, 4, > O are finite numbers; 
thus 


£n(u) = een TT fi (=). 


In what follows k runs over 1, ---, 23; ” = 1, 


If the Xn, = Xz/On obey the uan condition: 


2 
x 
max P[| X;,| = en] > 0 or max | a3 u(x) > 0 
k 
Uu 
or max] fy; (“) — ] | — 0, 
k bn 


then, according to the extended Central Limit Theorem, all possible 


a S , 
limit laws of sequences —* — a, of normed sums form a family 9% of 


bn 

i.d. laws, and the extended central convergence criterion applies with 
Fru(x) = F(dn%). 

However, in the case of normed sums, new problems arise. 

1° Given a sequence .X, of independent r.v.’s, find whether there 
exist sequences @, and J, >0 such that the uan condition (for the 
Xz/bn) is satisfied and g, — f ch.f., necessarily of the form e” with 
y = (a, WV); and if such sequences exist, then characterize them. 

2° Characterize the family 9¢; in other words, characterize those 
id. ch.f.’s e” and the corresponding functions © which represent limit 
laws of normed sums obeying the uan condition. 

But on the one hand, according to the convergence of types theorem, 
there always exist sequences a, and 4, > 0 such that the limit laws of 


S 
” — @, are degenerate and, on the other hand, all degenerate laws 
n 


belong to 9: e™? = (e*2/")", Thus, whenever convenient, we can and do 
exclude degenerate limit laws from our considerations. 


a. If g, — f nondegenerate ch.f., then the uan condition for the X4/bn 
implies that b, — © and bya 1/bn — 1. 
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Proof. We have 


gn(u) = e "TT te (<) — f(u) nondegenerate. 
k n 


If 5, ++ ©, then the sequence 4, contains a bounded subsequence 
and, by the Bolzano-Weierstrass lemma, this subsequence contains 
another sequence 4,, — @ finite as nm’ — ©. Setting uw, = b,-u, the 
uan condition implies that for every k, f;(u) = f¢(un'/bn') —> 1; hence, 
Je = 1 and f= 1. This contradicts the nondegeneracy assumption so 
that, ab contrario, bn, > ©. 


P 
Since Xn41/dn41 — 0, it follows by the law-equivalence lemma that 
. ° Sn Sn Sn41 
the limit laws of the sequences — — a, and — Gniy = 
bn n-+1 bn41 


* are the same. Thus ei" Sn(bn'u) — f(u) as n’ > ~, 
n-+1 
with 4, = b,/bn4, and f nondegenerate. It follows, by the corollary 


to the convergence of types theorem, that 4,4:/b, — 1. The proof is 
complete. 

*24.2 WNorming sequences. We have at our disposal the necessary 
tools to solve the problem of existence and determination of norming 
sequences 4, and 4, >0Q. Given the summands, we know, according 
to the convergence of types theorem, that 1° all the limit laws belong— 
if they exist—to the positive type of ove i.d. law and 2° it suffices to 
find one pair of such sequences. Furthermore, on account of the ex- 
tended convergence criterion (with Xn,x = Xz/bn), 3° if there exists 
a limit i.d. positive type, then the a, are determined by the expression 
given there, 4° the uan condition is satisfied and g, — e¥ if, and only if, 


Gn+1 — 


2 


max 


: pp ge Ee) — QO and V, => WV 


where W,, are defined on R by 


bnx 2 
(D) Yas) = Cf Peay tb), Oe = fe dP) 
k ¥ —o bn + y |z| <dnr 
with +7 ~ 0 fixed continuity points of Y (we shall see later that any 
7 is admissible, so that we may set, say, 7 = 1). The theorem below 
completes the answer. As usual, the superscript “s’’ will denote the 
operation of symmetrization. 
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A. NorMING THEOREM. There exist sequences by, such that 
Sn ; . ; ; 
£ G — cn) — £(X) for suitable a, if, and only if, there exists a WY 
nr 
such that, upon setting in (D), bn = b'n > O determined by 


1 n 2 
-> f _*_a,8(x) = ¥(40). 


2 k=] bn? + x? 
we have 
2 
———~ dF (x) — 0, 
() max [ dF) 
(ii) V, > W. 
Proof. The “if” assertion follows by taking normed sums yr — An. 


Because of the corollary to the convergence of types theorem and of 
the extended central convergence criterion, the “‘only if” assertion will 


follow by proving that if o(F — on) — £(X) with ch.f. e’, y = 


br 
(a, VY), then 4’,/, — 1. 
Upon symmetrizing, the hypothesis becomes £(5;,°/dn) — £(X°*) and 
the corresponding W* is defined by 


W(x) = YX) + ¥(4+0) — ¥(—% + 0). 


Thus ¥,° — W where W,,° are defined by 


bnzx 2 


¥,2(x) = x 5a a i. 


Upon using W*(+0) = 2¥(+0), and (D) with 4, replaced by 4’n, it 
follows that 


bn = 


iMe 


x2 
ee 9 ra se FH) -[sr@| — 0 


On the other hand, since degenerate limit laws are excluded, ¥* does 
not reduce to a constant. Therefore, there exists an @ > 0 such that 
26 = W'(a2) — W*(—a2 +0) > 0 and, hence, for ~ 2 mq sufficiently 
large, 


n +abn x2 
> f ; pea ee) >6>0. 
k=1 ¥ ~99%n Un 
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It follows that 
2 


0 [ala [bP a1 Se fo art) 
kar 2 (bn? + x7) (b'n? + x) 


| bn? _ By? | n tabn x2 7 

> ————________ - —__________ 7] F;8 

bn? + ab)? Pa Jo CA ae “) 
r\2 

5 | Gn/B'n)? — 1 


= T+ eb2j2 > = 


so that 4,/4', — 1, and the proof is complete. 

*24.3 Characterization of 91. We characterize NX by a decomposa- 
bility property and, then, we characterize the corresponding func- 
tions W. 

In order to define the decomposability property we prove 


a. If to a chf. f there corresponds a number c > O and a nondegenerate 
ch. f, such that, for every u, f(u) = f(cu)fc(u), thence <1. 


Proof. If c= 1, then f, = 1. If ¢ > 1, then, replacing repeatedly 


, ; u 
in the assumed relation u by — and | f,| by 1, we have 
c 


= (32m) (2-10 


and f is degenerate, so that f, is degenerate. The assertion follows 
ab contrario. 


Uu 
Cc 


le|f@ |= y( ) 


We say that a law and its ch.f. f are se/f-decomposable if, for every 
c € (0, 1), there exists a ch.f. f, such that, for every u, f(u) = f(cu)fc(u). 
Clearly, a degenerate ch.f. is self-decomposable and all its components 
f, are also degenerate. 


b. If f ts self-decomposable, then f ¥ 0. 


Proof. If f(2a) = 0 and f(u) #0 for O S uw < 2a, then f,(2a) = 0. 
Upon replacing ¢ and / by a in 


| felt + A) — felt) |? S 2{1 — @f-(A)}, 
| fea) |? S 2{1 — Rf-(a)}. 


This leads to a contradiction since, by letting ¢ — 1, we obtain 


we obtain 
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f-(4) = Lt) — 1 and the inequality becomes 1 $0. The assertion 


(ca) 


follows a& contrario. 


A. SELF-DECOMPOSABILITY CRITERION. 4 law belongs to N if, and 
only if, it is self-decomposable. 


Proof. A degenerate law certainly belongs to 9, so that it suffices to 
consider nondegenerate laws with ch.f. f. 

1° If f is self-decomposable, then let X,(k = 1, ---, 2) be inde- 
pendent r.v.’s, with ch.f. f; defined by 


_ _ Sku) 
Sulu) =f, _ (a) = eE-Do 


k 


Since /, («) — 1 uniformly in & and the ch.f. of is given by 
u 
Ih: ~ = f(x), 
k n 


the “‘if” assertion follows. 


2° Conversely, let f belong to 9%. There exist normed sums — 


bn 
with ch.f. gy, such that, denoting by /; the ch.f. of summands X;, 


Sn(u) = een If (<) — fH) 


bn, 
and, by 24.1b, 4, — ~, me 


— 1. Then, given ¢ € (0, 1), we can 


n 


. , Din 
make correspond to every integer #7 an integer m < ” such that — > c¢ 


* nr 
and m,n —m— oasn—-o, Since 


0 ane = [omr Ta (SF) femomem TE a (F)] 


Dn Din k=m-+1 bn 


where g,(u) — /f(u), and the first bracket converges to f(cz), it fol- 
lows that the ch.f. gmjny) whose values figure within the second bracket, 


converges to the continuous function f, defined by f,(u) = SM) There- 


f(cu) 


fore, by the continuity theorem, /, is a ch.f., and the proof is concluded. 


Corotiary. 74 self-decomposable ch.f. f and its components f, are t.d. 
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Proof. Since f belongs to %, fis i.d. On the other hand, upon taking 


for f;, the ch.f. of r.v.’s X; defined in 1° and m <n such that “6 Cy 
we have a 
m m tu ” u 
f(z) = Ih (~ “) II fi (“). 
k=1 nM? k=m+1 n 


The first product converges to f(cu); the second one converges to f,(z). 
nr 


Thus, f, is ch.f. of the limit law of sums >> X,, where the summands 
k=m-+1 


Xnk = = obey the uan condition. Therefore, f, is an id. ch.f., and the 
proof is concluded. 


We express now the self-decomposability criterion in terms of func- 
tions ¥ which figure in the representation of the i.d. self-decomposable 


ch.f.’s. 


B. W-cRITERION. Self-decomposable laws coincide with t.d. laws with 
functions © such that on (—~, 0) and on (0, +0), their left and right 
2 


1 
ama W'(x«) do not 


derivatives, denoted indifferently by W'(x), exist and 


increase. 


Proof. Because of the preceding corollary, the self-decomposability 
property of a ch.f. f, necessarily of the form e”, is as follows: for every 
c€ (0,1) the difference y,(u) = ¥(u) — (cu) defines a y-function 
(a log of an i.d. ch.f.). 


Upon replacing x by c7!x, we can write 


(1) wW(cu) = iu ca +d — 2) f- 2 an(e)| 


. 1 —2,,2 
+f (<i ~1-—— _— —— d¥ (cx). 
x 


1 + x? c 
Thus 
(u) = i +{(e , ) EF ance 
Y.(u) = ita, e ipe 2 c(X), 
where a, 1s a finite number and 
1 + cox? 
(2) dW¥,(«) = d¥(x) —- ———~— d¥(c71x), W,(—0) = 0. 


c—7(1 + x”) 
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Since Y, is a difference of two W-functions, its variation on R& is bounded. 
It follows readily that y, is a ¥-function if, and only if, Y, is nondecreas- 
ing on R. Since 


(3) W.(+0) — ¥.(—0) = (1 — c*){¥(+0) — ¥(—0)} 20, 
the self-decomposability property becomes d¥,(¥) 20 for every 


c€ (0, 1) and « #0 or, equivalently, on account of (2), for every 
c¢ € (0, 1) and arbitrary x’ < x”, «’x’”’ > 0, 


we 14 y? 
@ fo aw 


a! 1 + alt 1 + cy? _ 
-{ 5 d¥(y) -{ —5- avic*y) 2 0. 
x J x ¢ “y 


, , 


It remains to show that this last inequality implies and is implied by 
the one asserted in the theorem. 


If 


e> J 2 
1) =f" Faw), «eR, 


1°] 


h 


then, by setting in (4) x’ = e*—", x’ = e”, ¢ = e~", we obtain 


I) —J@— A =J@+A —J@) or Je) 2 weer ie. 


Therefore, the nondecreasing finite function J on R is convex (from 
above) and, consequently, / is continuous and its left and right deriva- 
tives /’(*) exist and do not increase on R. Since 


~~ = —_____. _________-, Q0<#6@<1 
et — | et t2eh ’ 


it follows, letting 4 — O and setting e” = y, that the left and right de- 
2 


W’(y) do not increase on (0, ©). 


—e* | + y? 


yy? 


rivatives W’(y) exist and that 


Similarly, introducing J~(*) = f 


same is true on (—%, 0). Thus (4) implies the asserted property of W. 


ad¥(y), we find that the 
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Conversely, if this asserted property is true then, for every c € (0, 1) 
and x’ < #6”, x’x’’ >0 


git 1 + y? alt 1 + y? dy 
f ~ av(y) -{ ——— ¥"(y) y 


, , Jy 
alt 1 + co? 2 d 
= TC y= 
a! ¢ Y J 


al 1 + co? 2 
-{ Sa UY), 


, 


so that the inequality in (4) holds and the conclusion is reached. 


RemaRK. Since Poisson laws correspond to functions Y discontinuous 
at some x ~ 0, they do not belong to the family 9t. This explains the 
isolation in which they remained as long as only limit laws of normed 
sums were considered. 


*24.4 Identically distributed summands and stable laws. The first 
family Nz of limit laws to be investigated by P. Lévy, was that of limit 
laws of normed sums 3 an of independent and identically distributed 
summands X, with an arbitrary common ch.f. fo. In other words, 
9tr 1s defined as the family of laws whose ch.f.’s f are such that 


. u 
Sn(u) = e nfo” (“) — flu), uCR. 
Clearly, the uan condition is satisfied, so that 9; CX. The self- 
decomposability concept and the criteria for 9U are easily particularized 
for Itz, as follows; we exclude degenerate limit laws which, clearly, 
belong to 9t;. Let a law and its ch.f. f be called stable if, for arbitrary 
b> 0, b’ > 0, there exist finite numbers a and 4” > 0 such that 


fou) = e*f(bu)fb'u), uCR. 


b b’ 
Upon replacing 5’’u by uw and setting ¢ = re = ye we obtain 


flu) = &Pf(euf(c'u) = flcu)felu) 


where 


fel) = &*P"f(o'n). 
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The self-decomposability criterion for 3t becomes 


A. STABILITY CRITERION. 4 law belongs to Mr if, and only if, it is 
stable. 


Proof. The “if” assertion follows from the fact that stability of 
f implies, taking fo =f, that the ch.f. of S, is of the form f"(z) = 
e“*"f(b,u) so that, norming S, with these quantities z, and 4,, we have 
&n =f. 

Conversely, leaving out—to simplify the writing—factors of the 
form e*“*, which does not restrict the generality, we have to prove that 


fo” (“) — f(u), u € R, implies that to arbitrary 4 > 0, d’ > 0, there 


n 


corresponds 6” > O such that /(6"u) = f(du) f(d'u). Since 6, — © and 


bn41 . 5 : ! 

> — 1, we can assign to every integer 7 integers m and m’ such that 
n 

bmn wo 

— — 5,— — Bb’. Then 

bn, n 


, Dm m!’ u bm u , Dm! u 
bn Dm! bn bmn bn Dm! 
and the right-hand side converges to /(du)f(d’u), while, according to 


the convergence of types theorem, there exists 4’ > 0 such that the 
left-hand side converges to /(4’’u). The conclusion is reached. 


Thus, a stable law is self-decomposable and, moreover, /, belongs to the 


positive type of f; in particular f is an i.d. ch.f. 


The W-criterion for 9 is easily transformed and, furthermore, the 
stable ch.f.’s are obtained in terms of elementary functions of analysis, 
as follows. 


B. A function f is a stable ch.f. if, and only if, either 


(1) log f(u) = tau — b| u | i + ic“ tan =| 
Jul 2 
or 
2 
(11) log flu) = ia a u| JI + ie» tog] «| 
|u| om 
with 


a20, 620, [el $1, y€ 0,1) U (1, 2]. 
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We observe that y = 2 gives the normal laws and that real stable 
ch.f.’s are of the form el 4I7 ,0 << y < 2. 

Proof. If the asserted forms of f are ch.f.’s, then they are clearly 
stable. Thus, we have to prove that these forms are ch.f.’s and that 
stable ones are of this form. ‘The first assertion will follow if we can 
determine functions W such that log f = (a, W). 

Let f = e” be a stable ch.f., that is, for arbitrary 5 > 0 and J’ > 0, 
there exist a and 5” > 0 such that 


iua + W(bu) + Y(b'u) = y(b"u). 


oo ; b 
1° We follow the pattern of W-criterion’s proof (wich c= <) . Upon 
replacing y by its representation in terms of a and W, the foregoing 
requirement reduces to 
1+ 57x? 1+ bl 2x 1 + b'?x? 


dU (b"'x). 
; (2%) 


Upon introducing the functions / and J~ defined on R by 


e* | 2 —e* ] 2 
10) =f avy, rwo=f Faw, «eR, 


and setting c* = 5, e* = B', e* = b", this requirement becomes 
(1)  {&(+0) — ¥(—0)} (7 + 4? — 3°") =0 
and 
(2) T@AA AIM HM) = Ie + 2"), 
J@ +A AI O+A) HLH), *# ECR, 


where 4, h’ are arbitrary numbers and h” is a function of 4 and #’. 

Let Y(+0) — ¥(+0) >0 so that J does not vanish. If, in the 
foregoing relation in J, we set repeatedly 4’ = A, it follows that, for 
arbitrary positive integers 7 and sz, 


nf (eth) = Jet hn), safe th) = Je + hen). 


Therefore, to every rational s > O(s’ > 0) there corresponds a number 
t(t’), such that, for every x, 


(3) s(x) = J(x + 4). 


Since J is continuous from the left and nondecreasing, with J S 0, 
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J(+) = 0, it follows that #’ |¢ as s’ fs, so that J 1s continuous, (3) 
holds for irrational s as well, and 
tle. as sTo- 


Since J does not vanish, we can assume—by changing the origin if 


necessary—that J(0) #0. Then, setting Jo = J//(0), it follows, by 
sJ0)=J®, JO) =JE), sJ@) =JE+F), 


JoAMJoC) = JoE +2), ER. 
The only nonvanishing continuous solution of this functional equation, 
with Jo() = 0, is proportional to e~”’ with y > 0. Therefore, setting 
y = e' and going back to W, the derivative W’(y) exists for y > 0 and 


1+ y 
y 


that 


—— Wy) = By, B29, 
taking into account the vanishing case. Since W is of bounded variation 
on (0, +), it follows that f yi’ dy is finite for e > O and, hence, 


y <2. Furthermore, replacing J in (2) by its above-found expression, 
we have 
Ov + b= 8", O< yy <2. 


Similarly, with /~: for y <0 


1+ ; 
ity —— W"(y) = —Bl |, 6B 2 0, 


with BY + 3" = b''" hence y = 7’ (set d = J! = 1). 
Therefore, on account of (1), either 2? + 4 = b’? so that J and J— 
vanish and f is a normal ch.f., or ¥(-++0) — ¥(—0) = 0 and, for y ¥ 0, 
W'(y) is given by the foregoing relations. 

2° According to what precedes, a stable ch.f. f is either normal or 
of the form 


— f (= | ) dx 
(1) log sw) = ua af (¢ 43) eR 


n af ( fue] 1uUx a) ax 
é — 
0 1 2) ghty 


If 0 < y <1, then it is possible to take out of the bracket the term 
tux Les . 
5 and, by modifying a, we obtain 
x 


apt ° tux ax , or ax 
(2) logsflu) = ina! +8] (6 -Deet [em -1= 
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Let wu >0. Setting wx = v and integrating along the closed contour 
formed by the positive halves of the real and imaginary axes and a 
circumference centered at the origin of radius r > ©, it follows, by 
the Cauchy theorem, that 


* tux dx —37 
(3) J (e™* — 1) ity = |u|l%e 2" T(—y), 
where 


r(—7) f “( —v _ 4) ® 20 
—y) = e °° — 1)-—— . 
Y 0 pity 


The first integral in (1) follows by taking the complex-conjugate of 
(3) and, for u < 0, log f(u) is obtained by taking the complex-conjugate 
of log f(| u |). Upon substituting in (2) and setting 


_b- B’ 
B+ Bp 
so that 6 = 0,|¢| S 1, we obtain the asserted form (i) of log f(x). 


If 1 < y < 2, then we can take out of the bracket in (1) the term 
LUX 


T+ x 


4 l — 4,,,// ° ux . dx 
(4) log f(u) = tua +af ( —1- un) 


b= —I(-7)(6 + 6’) cos 7 ¢ 


+ iux, and (2) is replaced by 


+0'f (1 — ins) 

e“= — | — iux) ——_- 
0 | a [hr7 
Proceeding as above we obtain the same form (i) of log f(w). 


If y = 1, the foregoing modifications of the third term in the bracket 
in (1) are no more possible. But, for u > 0, 


a ae iux \ ax 
J, ( -1-7F5)3 

"cos ux — | ”’. ux \ ax 
f Se atif (som Pa)S 
— Fut iutim{ [82a — f—* 

2 eLOld ex y? . o(1 + 0) 


€u 


T al; sin Y 4 inl S 1 )a 
——u — iulim —— dv + iu lim —— — ———_ } dv. 
2 el0Je 0” el Od, y? (1 + v?) 
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The limit of the second integral exists and is finite, and that of the first 
one is log u. The asserted form (ii) of log f(u) readily follows, and the 
conclusion is reached. 


24.5. Lévy representation. This subsection may read immediately 
after 24.1 except for reformulations of results in the intervening subsec- 
tions. 

So far we used systematically Khintchine representation of i.d. ch.f.’s 
eY with w = (a, VW) representing 


; . | 1 2 
W(u) = tau + fle —1- ws) 2 aV (x) 


where a € R and the Khintchine function V is bounded nondecreasing 
with W(— ©) = 0, (+0) < © or, in terms of the measure which cor- 
responds biunivoquely to it and is also denoted by W, the Khintchine 
measure ¥ on R (that is, on the Borel field of R), is bounded. W has no 
direct probabilistic meaning but presents definite technical advantages: 
It permits a simple description of the i.d. family withy = (a,V),a € R, 
WY bounded measure on R, as well as a simple description of convergence 
of id. laws: Wn = (any Yn) — W = (a, W) if and only ifa,— a, Ua W. 

In fact, ‘““Lévy representation” below was the initial one and is central 
to and born from P. Lévy probabilistic analysis of decomposable proc- 
esses (§41). 

Let barred integral sign mean that the origin is excluded from the 
interval of integration and, as usual, we omit its endpoints when they 
are —© and +o, 


P. Lévy representation of 1.d. ch.f.’s e# with » = (a, 6, L) is given by 


Y(u) = tau — Ei + Fe —1- iw) aL (x) 


where a, 8 € R and the Lévy function Z defined on R — {0} is nonde- 
creasing on (—, Q) and on (0, +o) with Z(+e) =O and 
f y? dL(y) < for some hence every finite x > 0. The corresponding 


Lévy measure L on R — {0} is bounded outside every neighborhood of 
the origin but may be infinite on R — {O}. 

The somewhat involved characterization of Lévy function explains 
why Khintchine representation is frequently favored despite its lack 
of direct probabilistic meaning. 
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The following correspondence is immediate: 


a. CORRESPONDENCE LEMMA. There is a one-to-one correspondence be- 
tween Lévy and Khintchine representations. 


It ts given by B? =W(+0) — ¥(—0) and 


(1) Fae oe + E220. 


or, more precisely, with x > O, 


(’)  L(—*) = {- _ we dU(y), L(x) -| > avy) 


and, conversely, 


ay) w-a) =f” A aton, vo =f aot 
The continuity sets C(L) and CW) are the same on R — {0}. 
A. I.p. CONVERGENCE CRITERION. 
Yn = (an, Br, Ln) > = (a, 6, L) 
if and only if 
(i) L,— Lon R — {0} 


(11) f y* dLnly) + Bi > 6? asn— » then0 <x—-0 
(i11) On — a 
Proof. Since va = (aa; Vn) > W = (a, V) if and only if a,—a and 


Vv, —-S W, it suffices to prove that V,, 5 Y & (i) and (ii) hold. 


We use a and Helly—Bray lemma and theorem without further com- 
ment. Let x > 0. 


Let ¥,-5 . Clearly (i) follows. Since for tx € C(¥) 


- 32 eae + y? ALA(*) + 62 = V(x”) — V,(—«*) 


(11) follows asu — ~ thenO < x — Ohence without the above restriction 
on + since ¥(¥) — ¥(—¥) is monotone in x. 


Conversely, let (i) and (11) hold. Clearly ¥,(—x%) ~W(—.x) for 
—x € C(L). ForO <e< x € C(L), from 
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W(x) = I; Tp ee) + f° Tp ala) 


+ Ba? + i) Tp a0) 
it follows that, as 2 — © then e— 0, 
V(x) > ¥(—0) + B+ ¥(~) — (0) = YC). 
The same is true for x = +o so that V,(+2°)-W(+o). Thus 


W, > W and the proof is terminated. 


*Reformulations. Lévy representation is visible in the main results and 
also in the proofs in the preceding subsections: 


1. ExTREMA CRITERION. Its statement in 23.4C is already in terms of 


Lévy function Z and of 6? = ¥(+0) — ¥(—0) of the i.d. limit law. 


2. EXTENDED CENTRAL CONVERGENCE CRITERION. This most impor- 
tant result of the section 23.4B is to be reformulated as follows. 
Let « > O and set 


Ln(—*) = SY Fre(—*), Lax) = > (Fne() — 1). 


Then, in terms of Z and £6? of the limit i.d. law, the criterion conditions 
are 


Ln L and 2, o2X*— #? asn—o then0 <e—0 


Furthermore, Lévy functions Z, have a direct probabilistic meaning in 
terms of the summands X,, k = 1, +--+, n: 


Ln(—*) = E(number of the Xx in (— ~, «)) 
— L,(*) = E (number of the X,, in [*, co). | 


3. The proof of the W-criterion 24.3B is, in fact, in terms of Z. For, 
the functions 7 and ¥~ therein are given by F() = —L(e?) and F(x) = 
L(—e?*). 


Lévy functions of stable laws within the proof of 24.4B are: 


y =2: L=0O~—normal law 
O<y< 2: dL(x) = B/|x|!+7 dx for x < 0, 
dL (x) = 6'/\x|!+7 dx for « > 0 


CLP for tid summands. In what follows, /, and fn are ch.f.’s. 
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We intend to solve directly the Central Limit Problem (CLP for short) 
for independent identically distributed (¢id for short) summands. We 
shall use A and generalize results in 24.1 replacing /,” =f for every 7 
by fn® > f. 

b. If fi 2 1 in some neighborhood |—U, +U] of the origin then on 
[— U, U], from some n = n(U) on, log fn exist and are bounded and 

2 | 
log fe = — SAL —f™ = (fe — I + 0(1)). 
m=1 


For, on |—U, +U], fn — l-uniformly so that, from some 2 = n(U) on, 
|1 —f,| < 1/2 hence log /, exists and is continuous and thus 1s bounded, 
and 


log fr = log(t — (I —fx)) = (1 — fr) — 5 fs)? — 
= (fn — IC. + o(1)). 
We generalize 24.1b: 
c. Lf fn" — f then f has no zeros and the same is true when e~i¥ f,"(u) > 
f(u) for every uC R. 


Proof. It suffices to prove that ch.f.’s (|fn|?)" — | f|? implies |/|? > 0. 
Suppose this “‘symmetrization”’ already took place so that f," — f with 
Jr and f 2 0. 

Since / is continuous with /(0) = 1, there is a finite interval [— U, + U] 
on which / > 0 hence log f exists and is bounded. On this interval, from 
some 7 on, log f, exist and are bounded, so that mlog f, — log f hence 
log f, > 0, that is, fn — 1, a applies 


n(fnr — 1)(1 + o(1)) = 2 log f, > log f 
and n(f, — 1) remain bounded. Since, by 13.4A 
n(1 — f,(2u)) S 4n(1 — fr(u)), 


it follows that on [—2U, +2U], from some x on, n(1 — f,) = O remain 
bounded, so that f, — 1, a applies and e"4»-) > f > 0, 

Upon continuing this doubling of the intervals, any given u © R belongs 
to an interval on which f > 0 hence f > 0 on R, and the proposition is 
proved. 


B. Jip CONVERGENCE CRITERION. Let w be continuous 


frr rm fen hr —- ly, and then f = e% is id. 
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More generally, tf fru—1 or an/n— 0 then, for every uC R, 
e— iva, f,"(u) > f(u) = —iua, + nGfale) — ly, 
and then f = e¥ 1s 1.d. 


Proof. 1°. Let x(f, — 1) -¥, so that f, > 1, b applies, and f,” > 
ev =f, 

Conversely, let f.” — / so that, by c, f has no zeroes and log f exists 
and is continuous. Given any finite interval, it follows that on it, from 
some 7 on, log f, exist and are bounded and, by b, »(/, — 1) > log f = y. 


2°. Let tua, + n(frCu) —1)—y(u) for every u€R so that 
—iua,/n +f,(u4) — 1—0 hence a,/n-O0@/,—-1. With either of 
these equivalent conditions b applies and, for every u € R, from some 
n = n(u) on, 


e—tetefyt(u) = (e-itnlnf(u))* > oF = fu). 


Conversely, let for every u € R, 


(een MF, (u))” = ee f,"(u) f(x) 


so that, by c, f(u) #0 hence e~™/"f,(u~) +1. Thus, once more, 
a,/n <= fn— 1 and, with either of these equivalent conditions, b applies 


and —iua, + n(fr(u) — 1) log f(u) = vw). 


It remains to show that the limit ch.f. f is t.d. This will follow from 
the “‘structure’”’ proposition below. In fact, this proposition provides a 
widening of the definition in 23.1 of i.d. laws since fn” = f for every 7 
implies f,” —/ but, in general, the converse is not true. It also provides 
a direct probabilistic proof of the structure theorem in 23.1: 


Let So = 0, Sa = Xi +--+ + Xn, 2 = 1, 2, +--+, where the sum- 
mands are iid with common ch.f. f. Let\ 2 0. Wesay thatar.v. S is 
(A, f)-compound Poisson if its d.f. is 


foe] 2 
Fy = e7> > ese 


n=0 "* 


Clearly F's is a d.f.: It is nondecreasing with Ps(— ©) = 0, Ps(+0) = 


ey x = 1. The corresponding ch.f. is immediate: 


n=0 


fs = esd, 


d 
It is an i.d. ch.f., since ent 5 the ch.f.of a (\/m, f)-compound Poisson 
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form = 1, 2,---. Also the centered ch.f. e~*#¢t+49—) is i.d. and the 1.d. 
assertion in B follows at once. And B yields 


STRUCTURE COROLLARY. f is i.d. if and only if there are compound 


Poisson fn with fr® > f. 


C. ip CENTRAL CONVERGENCE CRITERION. Let Xn, k = 1, +--+ n, be 
tid summands with common d.f. Fy, and ch.f.fr—2 1. Let x > 0. 


@2 Xnk — an) — £(X) necessarily i.d. with V = (a, B, L) 
if and only if 
(Cr): Li L with L, defined by 
Li(—*) = nF,(—*), Ln(*) = 2 n(F,(*) — 1), « > 0. 


(Cg°): nf yy? dFi(y) > B2 as n>o then »—0. 


(Ca): Gn = —ato(l) with a,= n { dF’, (x). 


Note that (C,) characterizes all admissible a,. 


t+2 


Proof. According to B, the required convergence is equivalent to 
Wn(“) = —tua, + n | (civ — 1) dF, («) — yu), u € R where, setting 


an = nf toy dP), 


; LUX 

Wn(u) = 1U(Qn — an) + nf (ee —1-—- Ti 2 oar, (x) 
(ap — Gny Br, Ln)s 

with L, defined by 


Li(—*) = nF,(—x), L(x) = n(Fi(x) — 1), « > 0, 
corresponding WV, defined by 


— _ 
V,(z) = nf Tae dF Cy), z€ R, 
and 6,2 determined by 


n J ; x? dF Cy) = if ; (1 + y’)d¥,(y) = J ; y? ally) + Br’. 


The asserted criterion follows at once from A. 
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COMPLEMENTS AND DETAILS 


1. Prove Lindeberg’s theorem without using Liapounov’s bounded case 
theorem. Then deduce Liapounov’s theorem. 
For Lindeberg’s theorem use the expansion 


2 2 
A“) =] — 73 u? + Onk = (cota + | x? aFu(s)) 


Sn n zlZesn 


To deduce Liapounov’s theorem observe that 


248 
2 oar. <b. El xE PH 
2 ware = 5 
Sn” J\zlzesn € Sy2t 


2. Prove directly the sufficiency of Kolmogorov’s conditions for degenerate 
convergence. Then deduce the condition in (1 + 6). 

3. Deduce the Kolmogorov and Lindeberg-Feller theorems from the degen- 
erate and normal convergence criteria—where existence of moments is not 
assumed. 

4, Deduce the bounded variances limit theorem from. the Central Limit 
theorem. 

5. Let 2 Xn be sums of independent uan summands centered at expecta- 


tions with >> o?X,x. = 1 whatever be x. Then 
k 


LO Xm) > 0,1) Xe 1. 
k k 


(Observe that the last convergence is equivalent to >) J ase aF,% —> 0 what- 


ever be € > 0.) 
6. Let ((¢ + iz), ¢ > 1, be the Riemann function defined by 


e+ i) = Dor = TT - po) 


where p varies over all primes. f:(u) = ¢(¢ + iu)/¢€() is an i.d. ch.f. 
(log fi(u) _ > > pp M(eW inu log p _ 1)/n.) 
pn 


7. An i.d. law may be composed of two non i.d. laws. In fact, there exists a 
non i.d. ch.f. f such that | f |? is i.d.: form the ch.f. f of X with PLX = —1] = 
pl — p)/A +p), PIX = kh =(1—p)l+p)p*/(i +p), k = 0,1, °°, 0< 

<i. 

? (Put f in the form (a, VY); observe that V so found does not satisfy the neces- 
sary requirements. Put | /|? in the form (a, W).) 

8. An i.d. law may be composed of an i.d. law and an indecomposable one: 

let X = 0 or 1 with pr.’s 2/3 and 1/3, respectively; the ch.f./ is indecomposable 


2 us . . 
log flu) = log FF" = Fale —1), Llan| <a. 
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Set 
log ft(u) = Dt an(e™™ — 1), log f~(u) = —2 07 an(e™ — 1) 
where >,+(>.~) denotes summation over positive (negative) a,. Then ft 
and f~ are i.d. and ft = ff-. 
Also an 1.d. law may be the product of an i.d. law and two indecomposable 
5+ 4cosu _ 2+ et > 
9 | 3 


9. P. Lévy centering function. The family of i.d. laws coincides with laws 
defined by 


ones: proceed as above but with f defined by 


_ 1Ux 


log f(u) = iau — ce +f (ei —-1— i) aL (x) 


where L is defined on R, except at the origin, is nondecreasing on (—#, — 0) 
and on (+0, +), with L(=F«) = 0 and { x? dL (x) < © for some 7 > 0; the 


barred integral sign means that the origin is excluded. 
Also 


log f(u) = ta(r)u — a u* +f (et — 1 — jux) dL(x) 


+ ( f. 4 i) (eiu= — 1) dL (x). 


This splitting of the domain of integration replaces the P. Lévy centering 
function g(x) = «/(1 + «?) by much simpler ones (g(¥) = « and g(x) = 0) 
within the partial domains of integration. 

Why was the centering function needed? Then, what are the conditions to 
impose uponit? Show that Feller’s centering function g(x) = sinwis acceptable. 
Is the following one acceptable: g(x) = « for | «| <.c for some finite positive 
constant c, g(x) = ¢ for x 2 c and g(x) = —c forx Sc? 

JO. Let r.v.’s Xnx with d.f.’s Fax, k= 1,°+', kn 7 ©, wm = 1,2,---, be 
independent in & and uniformly asymptotically distributed in k&, that is, there 
exist d.f.’s F, such that Fax — Fn, — 0 uniformly in k. The nondecreasingly 
ranked numbers Xn,4(w) into X*,,1(w) S SX nk (w) determine “ ‘ranked”’ 
A*,,r of “rank” r; the *Xn,5 = Xnk,+1~-s are of ‘end rank” s. Set 


Ln = do Fn M, = 2 (Fae — 1), 


En,rn = (Tn —_ » Fra)/V 2 Pal - Fy.) 
= d Inuks In = In —_ EIn)/olIn, Lnn(*) = Lixnk <zJ- 


Use throughout the fundamental relation 
[X* nr <x] = U(x) 2 7]. 


a) The X*,,, are r.v.’s. 
b) For fixed ranks 7, the class of limit laws of ranked r.v.’s X,,, is that of laws 
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pat 


* 1 Pl 
£(X*,) with d.f.’s F; 1 Gt)! 


nondecreasing, nonnegative, and not necessarily finite. 
These limit laws are laws of r.v.’s if and only if L(—) = 0, L(+) = +o. 
And 


e—'dt, where the functions Z on R are 


Fi, > FY Ly > L. 


c) For fixed endranks s, the class of limit laws of ranked r.v.’s *Xng is that of 
+00 ps—l 
laws &(*X,) with d.f.’s “F, = GoD! e~' dt where the functions M@ on R 
—_M — 1)! 
are nondecreasing, nonpositive, and not necessarily finite. 
These limit laws are laws of r.v.’s if and only if M(—0) = —, M(+0) = 0. 


And 
*Pis “> MP, => M,, _ M. 


d) For variable ranks rn — © with Sa = kn +1 —7rn — ©, the class of limit 


, . 1 ° 
laws of ranked r.v.’s X*,,,, 1s that of laws with d.f.’s FS = Tad, e712 dt. 
Te 
where the functions g on R are nonincreasing, and not necessarily finite. 
These limit laws are those of r.v.’s if and only if g(—0%) = +0, g(+0) = —o, 
And 


F* nyrn “s FES gnyrn * £. 
e) What if the X,, are uniformly asymptotically negligible? What if, moreover, 


f) What about joint limit laws of ranked r.v.’s? 
11. Let £(Xn — an) @ (a, B?, L) where Xn = >) Xnz are sums of uan inde- 
k 


pendent r.v.’s. 
(a) The sequence £(max | Xx |) converges. Find the limit law £(X). Why 
k 


can necessary and sufficient conditions for normality of the limit law of the 
sequence £(X;, — an) be expressed in terms of £(X)? Are there other 1.d. laws 
for which this is possible? (For » sufficiently large and x > 0 


log P{max | Xnz| < x] = —(1 + o(1)) X P[| Xnx| 2 +). 
(b) Let anx = | x dF, T > O finite, Far(x) = Far(x + @nz) and let F'ne 
z| <r 


be the d.f. of X’n~n = | Xnk — ane |" for a fixed r > 1. 
If £025 Xnz — an) — (a, B2, L), then there exist constants a’, such that 
i 


LC Xn — an)  (a’, 0, L’) with L’(x) = 0 or L(x) — L(—x"*) according 
k 
asx <Qorx>0. (If g 2 0 is even, then, for every ¢ > 0, 


fie dP nk = Sia eur « |") dF nk J g 4B nk = fasan8l x |") dF nk. 
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Take g = 1 and g(x) = x”. Observe that 
e € 2 
2 yo ’ 2(r—1)/r 205r 
OS farm —([ earn) Seer dF) 


(c) £(5 Xn — @n) — I(0, B?) if, and only if, 5X", 5 62 What about 
k k 


limit Poisson laws? 

In what follows and unless otherwise stated, degenerate laws are excluded; 
f, with or without affixes, is a ch.f.; and, without restricting the generality, the 
type of f is the family of all ch.f.’s defined by /(cu) for some c > 0. 

12. fis decomposable by every /", n = 2,3, ---, if, and only if, fis degenerate. 

13. f is decomposable and every component belongs to its type with f(u) = 
> f(cjz), Doe? 2 1, if and only if f is normal. 

1/4. If for anr > 0 and #1, /* belongs to the type of f, then / is i.d. If there 
are two such values r’ and r” of r and log r’/log r” is irrational, then f is stable. 

15. lf fa oA f'n Sf and fn = f'n f'n for every n, then f’ is a component 
of f. 

16. f is c-decomposable if /(u) = f(cu)f.(u) for some fixed ¢ necessarily be- 
tween 0 and 1. LZ, is the family of all c-decomposable laws, Zo is the family of 
all laws, and LZ, is that of self-decomposable ones. 

(a) Lo D L, D Ly, and if log c/log c’ is rational, then ZL, = L,. Every L, 
is closed under compositions and passages to the limit. 

(b) fE L, if, and only if, it is limit of a sequence of ch.f.’s of normed sums 
Sn/b, of independent r.v.’s with bn/dn41 — ¢. 


(c) f€ L, if, and only if, it is ch.f. of X(c) = D> &c* where the law of the 
k=0 


series converges and the & are independent and identically distributed. Then 
the series converges a.s., and fe, = fe. If & is bounded, then f is not 1.d. 

(d) g(x) is said to be y-convex (y > 0 fixed) if every polygonal line inscribed 
in its graph with vertices projecting-at distance y on the x-axis is convex. 

If & is i.d., so is X(c). fi.d. with Lévy’s function Z belongs to L, and /; is 
id. only if (—1)’M; are y-convex for y = | log ¢ | where M; are defined as in 9. 
Is the converse true? 

(e) If E& =0, o%& = 1, then, for ¢, c’€ (—1,+1), the covariance 
EX(c)X(c’) = 1/(1 — ce’), and the random function X(c) on (—1, +1) exists 


in q.m. and is continuous and indefinitely differentiable in q.m. 


Chapter VII 


INDEPENDENT IDENTICALLY 
DISTRIBUTED SUMMANDS 


This chapter is devoted to study in some depth of consecutive sums 
Si, Se, +++ of sequences of independent identically distributed sum- 
mands Xj, X2,--+ with common law £(X); we shorten “independent 
identically distributed” to z#d. As usual, methods are emphasized. 
Methods and results took their definitive form in the third quarter of 
this century. 


In the preceding chapters some results about iid summands were ob- 
tained: Kolmogorov law of large numbers (17.3B) and its generalization 
17.4, 4°, convergence of laws of normed sums to normal when the 
summands have finite second moments (21.1A) and the far-reaching 
characterization of all limit laws of normed sums (24.4), by particular- 
izing the solution of the general central limit problem. 

In this chapter, using directly 24.5, by means of Karamata theory, we 
obtain in §25 the above limit “‘stable”’ laws and their “domains of at- 
traction’’—those families of laws for which the laws of normed sums 
Sn/On — @n converge to any given stable one. 

In §26, we study “‘random walks’’; sequences of sums 8},So, °° > 
themselves (not normed), their global and asymptotic behaviour with 
their dichotomy into “‘recurrent”’ and “transient’”’ ones, and their fasci- 
nating ‘“‘finite fluctuations.” 


§25. REGULAR VARIATION AND DOMAINS OF ATTRACTION 


The domain of attraction of the normal law was found by P. Lévy, by 
Feller, and by Khintchine. The domains of attraction of all other stable 
353 


354 INDEPENDENT IDENTICALLY DISTRIBUTED SUMMANDS [Sec. 25] 


laws were discovered by Doeblin and by Gnedenko. Much later, Feller 
observed that these results were in terms of Karamata regular variation 
theory and showed its usefulness for various limit probability problems. 
We follow his presentation of Karamata theory, and then apply it to the 
problem of stable laws and their domains of attraction. It deems ad- 
visable that at the first reading only A and its Corollary be covered in 
25.1 and c in 25.2 be assumed. 


25.1 Regular variation. Let U, VY be positive monotone functions on 
[0,0 ) to [0,~) and let x, y be positive. 

We say that U varies regularly (at +0) with exponent aC R if 
U(x) = x*V(«) where V varies slowly (at +), that is, V(tx)/V(t) > 1 
as f—> © for every x. Thus slow variation is regular variation with 
exponent 0. Since our only concern is with behaviour at +, we may 
take x, y > c € R with c > O arbitrary but fixed, or substitute (c, ©) 
for [0, ©), or assume that U, V vanish on [0, c]; this will be done without 
further comment. 


A. REGULAR VARIATION CRITERION. Let D be a set dense in |O, ©). 
U varies regularly tf and only if, for every « € D, 


U(tx)/U(t) > h(x) < © as to, 
and then h(x) = x* for some a CR. 


Proof. The “only if” assertion is trivially true. As for the “if’’ as- 
sertion, letting f—> © in 


U(tx) Ultxy) Ulty) 


i 


it follows that 
h(xy) = hA(w)A(y) for «,y © D. 


Since Uis monotone, this functional equation extends to [0, ~) by taking 
limits from the right. But then it has a unique finite solution of the form 
h(w~) = x* for some a € R, and the proof is terminated. 


Coroutiary. Jf for every x € D dense in (0, ~), 
CrU(b,x%) > h(x) finite positive 
and 
bn— ©, Cnyi/trn— 1, 


then U varies regularly and h(x) = cx* for some finite a andc > 0. 
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Proof. If is the smallest integer such that 4, S ¢ < das then 


U (b,x) _ Ulex) e U(bn41*) 
O(bn41) ~ Ul) ~ Uy) 


where U is nondecreasing, while these inequalities are reversed when Uis 
nonincreasing. By a change of scale we may assume that 1 € D. Then, 
SINCe Cn41/¢n — 1 and ¢rxU(bn) > A(1) = ¢ > O, for every « € D the ex- 
treme terms converge to A(x)/c hence U(tx)/U(t) — h(x) /c, the above 
criterion applies and A(x)/c = x* for some a € R. 


*Let H be a positive monotone function on [0, ~) and set 


U(x) = J ytH(y) dy, Vile) = J y*H(y) dy 


where x > 0 and a are finite. 
Upon replacing if necessary 0 by c > 0, or assuming that H vanishes on 
[0, c], Ua(~) will be finite while /,(”) may be infinite. Since 


Use) t Usle) and Vile) | Vil) as xf @ 
while 
Ua(o) = Us(x) + V(x) hence Una(~) = Us(o) + V(@), 
it follows that 


Ui(7) Ko @ V,{(~) = 0 > V(x) < © from some x on 
U,(o) = © & V(x) = © for every x3 V~(0) = &, 


a. Let H vary slowly. Then U,() and V,(@) are finite fora < —1 
and infinite fora > —1. Furthermore 


Gi) Ifa = —1 then U, varies regularly with exponent a + 1. 
(i) Lf a < —1 then V, varies regularly with exponent a +1, and this 
still holds for a = —1 provided V_, is finite. 


Proof. Given x > 0 and e > 0, slow variation of H implies existence 
of 6 > O such that, for y > 4, 


(1) (l — )A(y) S Hwy) S 1 +6)H0). 


1°, Let Valo) = Ohence Va(x) < © for some x on, and U,(~) < o. 
Since 


Cc 


Vi(tx) = «eh J y*H (xy) dy, 
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it follows that, for ¢ > 6, 
(1 — 6—)x*4V,() S Vitex) S (1 + —x*7 2) 
hence, letting s—> © then e— 0, Va(tx)/Vi(t) > xt. Thus, VY, varies 
regularly with exponent ¢+10 since Vy is nonincreasing, and 
U,(~) < » with Y.(~) = Ooniy ifa S —1. 
2°, Let U.(~) = ~ hence V,(~) = ~. Since, fort > 4, 
t 
Ua(tx) = Ua(dx) + x0t J yl (xy) dy 
hence, by (1), 
(1 — e)x*#1U.(t) S Ual(ex) — Ui(dx) S (1 + xe UL (2): 


upon dividing by U,.(f) and letting t— © then e— 0, it follows that 
U.(tx)/Ua(t) ~ x41. Thus, U, varies regularly with exponent 
a+1 2 Osince U, is nondecreasing, and Uaz(o) = © hence V,(o) = 
o only if a 2 —1. The assertions follow from 1° and 2°. 


*B, Main Karamata THEOREM. Let H be positive monotone on (0, ©) 
and set 


x 


U.(*) = J y*H(y) dy, Valx) = J y*H(y) dy. 
(i) If H varies regularly with exponent b < —a —1 and V(x) << © 
then, ast— ©, 
fH (t)/Vi(t) ~¢ = —(@a+6+4+ 1) 20. 


Conversely, if this limit exists and is positive then V, and H vary regularly 
with exponents ~c =a+h-+ 1and bd, respectively, while tf this limit 1s 0 
then V, vartes slowly. 


(ii) If H varies regularly with exponent 62 —a —1 then, ast—> ©, 
fH (t)/Us(t) 9c =a+64+120. 


Conversely, if this limit exists and ts positive then U, and H vary regularly 
with exponents c = a+b-+1 and b, respectively, while if this limit ts 0 
then U, varies slowly. 


Note that when c = 0 the converse assertions for c > 0 continue to 
hold for VY, and for U., but nothing can be asserted regarding H. 


Proof. The argument for (i) and (ii) is the same, and we shall prove (i). 
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Set 
(1) h(y)/y = y*H(y)/Vo). 
Since 

aH(y) - —4%e) 
J H(y) ~~ dy ’ 
upon integrating (3) over [f, tx) with x > 1, it follows that 
tx x 
1. Valtx) i) h(y) A(tz) 1 

(2) log VO 4. oy dy = h(t) ion dz. 


Let A vary regularly with exponent 4 so that, by a, Ya varies with ex- 
ponent a+4-+1 = —c. Thus, both sides of (1) vary regularly with 
exponent —1 and &/ varies slowly. Therefore, as¢— ©, the integrand in 
the last integral in (2) tends to 1/z while the first term in (2) tends to 
clog* and Fatou—Lebesgue theorem implies that limsup A(?) Sc. 
Thus, 4 is bounded so that there is a sequence f, > © with A(t,) > 
¢ Sc< _o, Since 4 varies slowly, A(tzy) — c’ for every y > O and, by 
the dominated convergence theorem, clog x = c’ log x hence c’ = ¢ for 
every such sequence (¢,). Therefore, A(¢) - ¢ as t—> © and the direct 
assertion is proved. 


Conversely, if the limit c = O exists so that A(t) > c¢ as t— © then, 
by (2), Ya varies regularly with exponent c. Moreover if ¢ > 0 then this 
property of V, together with (1) implies regular variation of H with ex- 
ponent —c —a—1=b. This completes the proof of (i) and (ii) is 
proved similarly. 


*C, SLOW VARIATION CRITERION. AH varies slowly if and only if 


H(x«) = h(x) wont [ gs) ay} 


where g(x) > 0 and h(x) —> ce < wm asx, 


Proof. The “‘if’’ assertion is easily verified. As for the “only if” 
assertion, let H vary slowly. Then, by B(ii) with a = 4 = 0, 
A(t) /Ut) = (1 + g(4))/t with g@) > Oast— o. 

. _ dUd(t) 

Since H(?) = a 


that 


» upon integrating over [1,~) with x > 1, it follows 


Uo(~) = Uo(1)x wn f go) wy}. 
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But, by Bi), 
F(x) = A(x) Uo(x)/*, 
and the “‘only if’ assertion obtains. 


Coro.iary. If H varies slowly then, asx—”, H(x + y)/H(x) > 1 
and, given 5>0, x~*H(x) 70, 8H(x%) > ©, and x? < H(x) < x? 
from some x on. 


*Let G be a d.f. vanishing on (—, 0). Let « > 0 be finite and set 


uals) = J ydeQ), vols) = J 92460), 


Since we are concerned only with asymptotic behaviour of these inte- 
grals, whenever convenient we do take G = 0 in some neighborhood of 
the origin. We assume that 


Halo) = lim pe(w) = -, velo) = lim v(x) = 0 


so thata > Oand —o> < B <a. 


The elementary integration by parts which follows will reduce the 
question of regular variation of ue and of vg to the main Karamata 
theorem. 


b. INTEGRATION BY PARTS LEMMA. Let x be a continuity point of G 
hence of pa and of vg. Then 


Gi) pals) = ae o(s) + @ — Bf 9-069) 


Gi) vgls) = — wens) + @— B) J 9? nay) dy. 


Proof. Relation (i) results at once from integration by parts of 
Stieltjes integrals. Relation (ii) requires also a passage to the limit: 
Integration by parts on [x, 4) with ¢ > 1 continuity point of G yields 


(1) vp(x) — vp(Z) 
= —xP-ay. (x) + £F-*u a(t) + (a — 8) J yPe Maly) dy. 
Thus, 


(a — 8) 9 yaly) dy S vex) + x8“ a(x) 
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so that, letting t > ©, the limit of the integral on the left is finite. Since 
Me is nondecreasing, ast—> ©, 


2¢ 


eeu a(t) S J yh naly) dy > 0 


hence #-“u.(#) — O and, letting > © in (i), (ii) obtains. 


Qp-a 


B—ea 


*D, VARIATION OF TRUNCATED MOMENTS. Let pale) = © and 
yao) = Oso thata > O0and—-~ <B<a. 
Gi) If wa or vg varies regularly, then, asx —> ©, 


wo By 6(x)/BalX) 6 = rr =0, BSySa. 
(ii) Conversely, if this limit exists then, for B < vy <a, ba and vg vary 
regularly with exponenisa =a—y>OQOandb =B — vy < 0, respectively, 
whilea =O when y =a andb =Owheny — B. 


Note in the boundary cases while we varies slowly when y = @ and 
vg varies slowly when y = 8, nothing can be asserted regarding vg or 
Ma, respectively. 


Proof. 1°. Let ue vary regularly with exponent uw. Finiteness of the 
integral in b(ii) yields wu Sa — 8. Since pe is nondecreasing u 2 0. 
Thus, setting u =a —y, we have 8 Sy Sawithy 20. Now, bi) 
yields 


CO 


me By (x) a— Bp 
BN yb-e-ly, (dy) 


= —1+ xan (x) J 


(1) 
so that, using BG) with H = wz anda = B —a — l, as x— ©, 


xy (x) /Ma(x) > —1 + (a — 8)/(y — B) = (@— v)/(y — B) = 6 


with c = » when-y = B, and this is the asserted limit. Let vg vary 
regularly with exponent vso that v S$ B. Since vg is nonincreasing v S 0. 
Thus, setting 0 = B — y, we have 8 Sy Sawithy 2 0. Proceeding 
as above but with b(i) in lieu of b(ii) and using B(i) but with H = vz 
and a@ = a — $8 — 1, once more the asserted limit obtains and (i) is 
proved. 


2°. Conversely, let the limit ¢ = (ea — y)/(v — B) exist. If O< 
¢ < o then (1) yields, asx &, 


(2) 8 -4n (x) / J yh-a-ly, (dy) > (a — B)/(e +1) =y—B8. 
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Using B(i), it follows that wa varies regularly with exponent a — y > 0 
while, by (1), », varies regularly with exponent B — y < 0. Ifc =0 
the same argument shows that ue varies slowly but yields nothing about 
v,. Similarly, if¢ = © then », varies slowly but nothing can be asserted 
about La: 

The proof is terminated. 


25.2 Domains of attraction. Throughout this subsection, ™1,X2, 
--- are tidr.v.’s with common law &(X), a.f. PF, ch.f.f and 8S, = Xi + +>: 


fo 


+Xn, nm =1,2,--+ 3 we take x > 0 and set p2(*) = f ydF(y), q(x) = 


1 — F(x) + F(—x). - 


We say that £(X) belongs to the domain of attraction of a law £&(Y) or 
is attracted by &(Y)—an attracting law, if there are a, and 4, > 0 such 
that £(Sn/bn — Gn) ~ L(Y). We exclude the trivial case of degenerate 
attracting laws &(Y) for, according to 14.2, every £(X) belongs to its 
domain of attraction with suitable a, and 5,, and this excludes considera- 
tion of degenerate £(X). In fact, always according to 14.2, the above 
definition pertains not to individual laws but to types of laws. 

In terms of ch.f.’s, £(X) is attracted by £(Y) nondegenerate means 
that, for every u € R, 


evan f2(4/b,) — fy(u) nondegenerate. 


Thus, ch.f.’s | f(u/bn)|? > |fy(u)|?, so that |fy(u/,)|? > 1 with nonde- 
generate |fy|? hence 6, > ~. It follows that also £(9n/bn41 — an) > 
&(Y), that is, |f(u/bn41)|? > |fy(@)|? and, by the Corollary to 14.2A, 
bnsi/bn— 1: 


a. If &(Sn/bn — an) > &(Y) nondegenerate, then b,—> © and bysr/ 
bn — 1. 


Since f(u/d,) > 1, 24.5C applies with Xn, = X1/b, hence F,(*) = 
F(b,«),n = 1,°++,n, and 


b. &(Sn/bn — an) > &(Y) nondegenerate—necessarily i.d. with p = 
(a, 6, L), if and only tf, 


(Cr): Ln L where L,(—x) = nF(—dnx), Lax) = n(Fr(*) — 1). 
(Cg2): np2(dnx)/be = nf 


Ho 


dF (b,0) > B asn— © thenx— 0. 


(Ca): an = An — a + 0(1) where a, = nf tp doe) 
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With the help of these lemmas, used without further comment, we 
begin by investigating condition (Cz) and its implications for Lévy 
functions of (nondegenerate) attracting laws £(Y). Clearly, the Lévy 
function for normal £(Y) is Lz = 0, and conversely. The others are 
given by 


A. Levy FUNCTIONS AND (Cz). Let x > 0. 
(1) Lévy functions Ly of nonnormal attracting laws (Y) are given by 


L(—*) = cp/x7, Ly(%) = —eg/x 
where 
O<7<2, ¢>0, pjg2zO0withpt+gq =l1. 
(11) Condition (Cy) 1s: a5 % —> © 
I'(—x)/q(*) > p or (1 — F(x))/g(*) > 1-—p 
and 
g(x) = (¢ + 0(1))A(%) where h(x) varies slowly. 
The admissible b, are characterized by nq(bnx) > C asn— ©, 
Proof. Condition (Cz) reads: for +x € C(L), asn— &, 
(1) nF —byx) > L(—x) and (2) n(F(d.x) — 1) > L(x) 
hence 
(3) ng(bnx) > L(—x) — L(x). 
In fact, any two of these three relations clearly imply the remaining one. 


1°. Since Z = 0 1s excluded, there is an x9 > O such that L(—.x*0) — 
L(x) > O hence L(—x) — L(x), being nonincreasing with increasing «, 
is positive for x € (0, %o]. It follows that the Corollary of 25.1A applies 
to (3) so that, setting L, = L,asn— ~, 


(4) ng(dnx) —> Ly(—x) — L(x) = ¢/xt 


with ¢ > Oandy > 0. 
On the other hand, upon changing in (1) and (3) the fixed x into fixed 
y and for every « > 0 selecting 7 to be the smallest integer such that 
bay Sx” SS bnaiy, we obtain 
nm (a+ lP(—-asy) — F(-*) en +1 nF —bny) 


n+1 nq(bny) —~ gx) ~ xn (2 + 1)¢(Onsy) 
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Upon letting x > © so that 2— ©, the extreme sides converge to 


p = L,(—y)/(Ly(—y) — £,(y)) so that 


(5) F'(—x)/q(*) > pwithhOSpsl 
equivalently 
(6) (1 — F(—x))/q(*) > 1 — p. 


Thus, replacing in (5) « by 4,% with x arbitrary but fixed, as 2 —> ©, 
nF (—byx) /ngq nx) > p 
hence, by (4) and (1), 


(7) nF (—byx) > Ly(—x) = ep/xt 

and, similarly, 

(8) n(1 — Fb,«)) > Ly) = eg/x7. 

Since the requirement for any Lévy function, y*dL(y) finite, is 


satisfied if and only if y < 2, we must have O < y < 2. Thus (1)—the 
asserted form of Lévy functions ZL, of nonnormal attracting laws £(Y)— 
is established. 

2°. Condition (Cz,) became: 


(5) lim F(—x)/q(x) =p, O57 51, 
and 
(4) lim uq(dax) = c/x’, c>0, O<y <2. 


According to the Corollary of 25.1A (4) implies 


(9) q(x) = (¢ + 0(1))A(%) with A(x) varying slowly. 
On the other hand, setting » = 1 in (4), the scale factors 4, must satisfy 
(10) lim 2q(4n) =¢ > 0. 


Thus, if £(5n/bn — an) 2 &(Y) nonnormal then (5), (9) and (10) hold. 
Conversely, let (5), (9) and (10) hold. From (10) it follows that 4, = 
inf{x: q(x — 0) 2 c/n = q(x + 0)} — @ hence 


lim 2q(bnx)/c = lim q(bnx)/q(n) = #77 lim h(Snx)/hA(bn) = x7, 
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and lim mq(4nx) = c/x’. Thus, (Cz,) holds and admissible 4, satisfy 


(10). The proof is terminated. 


Remark. Clearly, (Cz,) can also be stated in a more symmetric 
form: 


F(—x) = cp(l + o(1))A@)/xy and 1 — F(x) = cg(1 + 0(1))A(x) /x7. 
In order to complete A we need more of Karamata theory. We write 


v.s. for “varies slowly.” 


*c, SLOW VARIATION LEMMA. Let x— ©, 
(i) Tf u2(~) = © then 
forO<y <2: 


xq (x)/ma(%) —> (2 — )/7 = Male) /x?77 0.5, > xIg(x) 0.5. 


fory =2: 
xq (x) /ma(~) — 0 = po(%) 0.5. 


xq (x) /u2(~) — 0 
(ii) 0 < w(x) < o— 
Sh 015 (x) U.S. 


Proof. Ifu2(°) = & then (i) follows from 25.1D with G(~) = F(x) — 
F(—x*), a = 2, and B = 0, so that »o(¥) = q(x). 


If 0 < p(w) < o then x2¢(x) < f y?dF'(x) + 0 consequently 


lyl2x 
x°q(x)/u2(~) 0 while, clearly, uo(t)/us(~) > 1 as t—> ~, that is, po(x) 
varies slowly. 


Remark. Recall that when 0 < y2(¥) < © then, taking X centered 
at its expectation and setting 0? = o?X = po(~), &(S,/¢ Vn) > 91(0,1) 
since 


Ptufovanyr = (1-1 + 0(1)))" ew, 


Thus, when 0 < p?(o) < © then £(X) is attracted by normal £(Y), 
and other types of attracting laws may happen only when po(o) = ©, 


We say that £CX) is stable if, for every 7, there are a, and 4, > O such 
that £(8,/b, — a@,) = £(X); clearly, stable laws are attracted by them- 
selves. Note that these are “stable” laws introduced in 24.4. We write 


L,(¢,p) for L, characterized by ¢ and 7 as in A(i). 
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B. STABILITY AND ATTRACTION CRITERIA. Let x > 0. 


(i) The family of all nondegenerate attracting laws consists of all non- 
degenerate stable laws. They are t.d. laws with p, = (a, B,?, Ly), 0 < 
y S 2 and 


orO0<¥ <2: 
BY =0, Ly(—x«) = ep/x7, Lyx) = cq/x’, 
wherec>0,2,¢20,p+¢ =1; 
fory =2: 
62 >0, LL, = 0. 
(11) &CX) zs attracted by some Ly with given y © (0,2] if and only if, as 


eee ©, 


xq (x) /ma(%) > (2 — y)/v. 
&(X) is attracted by &, with given L,(c,p) if and only if, asx — ~, 
forO<v7 <2: 
E'(—x)/q(x) > py g(x) = c(1 + 0(1))h@)/x7 


where h(x) varies slowly, and admissible b, are characterized by nq(bn) > ¢ 
asn—-o; 


fory = 2: 
u(x) varies slowly and admissible b, are characterized by ny2(b,)/b2 —> 
B2 > OQ asn—-o., 


In either case, admissible an are characterized by 


x 
Gn = An — a+ (1) where an =n jets AF (bnX). 


Proof. Stability assertion is immediate. For, every stable law is at- 
tracted by itself while, conversely, the attracting laws £, are stable for 
b, = nm/7; use the form of Lévy functions Z, in A(i). 

In A, we already found, for O < y < 2, (Cz,) and the L, as well as a 
characterization of admissible 4,. It remains to examine 

(Cp): 2u2(bnx)/b.2—7 BY as no then x0, 
and to find admissible J, for y = 2. 
1°. Nonnormal case:0 < y — 2. (Cz,) is given by: as x > © 
(1) I(—x)/q*) 7p, OF p81, 
and 
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(2) g(x) = c(1 + 0(1))A(x)/x7, A(x) slowly varying, c¢ > 0, 
or, when not specifying c, 

(3) x’q(«) varies slowly, 

while admissible 4, are characterized by 

(4) ng(b,) ~c>O as no, 


We must have pe(o) = o, for 0 < we(~) < © implies normality, 
that is, y = 2 with LZ, = 0. But then c(i) applies and (3) is equivalent 
to: asx —> ©, 


(5) xq (x) /pa(x) > (2 — ¥)/¥ 
and to 
(6) bo(x)/x?—-7 varies slowly. 


Upon replacing « by J, in (5) and using (4), as 7 — ©, we obtain 
(7) ny2(bn)/bx? > c' = cy/(y — 2) > 0. 
But (6) implies that as 7 > © 


Mu2(Onx) / Ox? eo 
“2 (bn) /br? 


hence, by (7), 
(8) Np(OnX) [On > cx 7, 
Therefore, for 0 < y < 2, (Cg) becomes 
O <— nueo(bnx)/b2 > BY as nao then x—>0, 
and we have the asserted y, = (a, 0, Z,), and convergence. 


2°. Normal case: y = 2. Nondegenerate normal laws correspond to 
Yo = (a, Be”, O) with B? > 0. (Cz,) and (Cg) become: as 2 > &, 


(1) nq (bnx) —>0 
and 
(2) np2(Onx)/bn? — Bo, 0 < BX < ©; 


setting « = 1, admissible 4, are characterized by 


(3) Np2(On) [Dre —_ Bo". 
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If 0 < p2(o) < © go that, by c(ii), ue(~) varies slowly and x?¢(x)/ 
po(x) 20 = (2 — y/y for y = 2, then it is easily seen that for X cen- 
tered at its expectation, (1) and (2) hold with 4, = oVn, 0 = 0°X = 
ye(o), and we have the required convergence (as we already knew). 
Thus, it remains to consider the case p2(~) = ©. Then, by c(i), as 
x— ©, x2¢(x)/uo(x) > 0 is equivalent to ue(*) varying slowly. 

Let 2(x) vary slowly so that u2(x)/x? > 0 asxw— ©. Then (3) holds 
for 5, = sup{x: u(x) /x? = B2/n} and 6, > ©, so that lim pe2(bn%) /u2(An) 


= 1 becomes, by (3), 2u2(bnx)/bn? — B22 > 0, that is, (2) holds. Since 
lim x°g(x)/u2(x) = 0, upon replacing therein x by 4,” with x > 0 arbi- 


trary but fixed, we have 
lim —Z22e¥) _ _ 9 


Ny2(Onx)/br2 7 
hence, by (2), lim 2g(4.”) = 0, that is, (1) holds. Thus, po() varying 


slowly implies £(Sn/n — an) > £2 for admissible ap. 


Conversely, let &(Sn/bn — Gn) + &2, so that (1) and (2) hence (3) 
hold. We prove that u(x) varies slowly, that is, (ue(*Z) — ue(~))/me(*) > 
Oasx— © for, say, ¢> 1. Let x © and let 7 be such that 4 S 
x <b, so that 2—> ©. Then, since 4, ©, (3) implies that 2p2(«)/ 
b,? — Be > 0, that is, we(x) ~ Bb,2/n. Since bn4i1/bn 21, by (1), 


bn 
po(xt) — pe(t) S h2 dq(x) S Pbns°¢ (bn) 
= P(On41?/dn?) (bn?/m)nghn) = 0(bn?/2), 
and the assertion follows. 
The proof is terminated. 


CONSEQUENCES 
1°. For stable laws 
y,(u) = iau —clu|r1 — bh,(u)), uC R, O< y 8 2, with c>0 
(c = 0 for degenerate laws), b = p — q hence |b| S 1 and 
Tv 


h,(u) = tan 5 Yor 2 log| u| according asy ~ lory = 1. 


Follows from B(i) by the computations in part 2° of the proof of 24.4B 
where £ is replaced by cp, 6’ by cg, and 6 and ¢c are interchanged. 


2°. Nondegenerate stable df.’s F, are infinitely differentiable and 
| FY | s | FO) | positive, for every n = 1, 2,++-. 
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Proof. Since, by 1°, | f(z) | = exp(—clu|7) with 0 < y S$ 2,c>0,/f,is 
integrable and so are the functions with values u"|/,(u) | for every 7. 
Therefore, the inversion formula becomes 


ay Zz 
— é WU 


F(x) — F,(a) = z i] en pl) du 


and we can differentiate 7 times under the integral sign for the integral 
so obtained is absolutely convergent so that 


(—2)"7 [Oa 
FYO(«*) = a | u—le—tzt (4) du. 
It follows that | F,™(«) | S | F,~ (0) | > 0. 
Let gq) = 1 — FY) + F,(—*) and let £, be nondegenerate. 
3°. If &, is a stable law with 0 <y < 2, then x7q(x) > ¢ > 0 as 


xX —> ©, 


Proof. We know that £, attracts itself with scale factors d, = nV 
(also true for y = 2); this also follows from | /,(u) | = exp(—cluly) 
since | f,"(7/1u) | = | f,(u) |. Therefore, by B(ii), replacing 4, by 2/7 in 
ng(bn) —- ¢ > 0, we have (7/7) 7q(n/7) — ¢. Since g(x*) is nonincreasing 
with x increasing, taking 7”7 S » S (n + 1)"7, we obtain 

n 


1 
n ~ ; ng((n + 1)1/7) < x7q (x) < Paras | . nq(n1!7), 


where the extreme terms tend to c as x © hence 2— ~, and 
xIq(~) > ¢. 


4°. If &(X) ts attracted by &., then 
(1) E|X|\"'< © forOSr<ys2 
(11) E|X |" = © forr> vy whn0<y < 2. 

If £(X) = £,with0 < y < 2, then E| X |" ts finite or infinite according as 
Osr<yorrZzy. 

Proof. Ifue(o) = EX? < @ then, by 9.3a,E |X|" < © forr<y= 
2, while Z| X |" may be finite or infinite for 7 > 2. This shows why 
y = 2 1s to be excluded from (ii) and also that it suffices to prove (i) 
when po() = »—even for y = 2. Then, by c, asx— &, 

x7q(«) = c(1 + 0(1))A(e), 
where A(x) is slowly varying hence, by the Corollary of 25.1C, given 
6 > O there is an a such that, for » 2 a, 


x < h(x) < x. 


368 INDEPENDENT IDENTICALLY DISTRIBUTED SUMMANDS [Szc. 26] 


On the other hand, by integration by parts, 
+00 00 
E| X |" = | | X |"dF(*) = a x*19(x) dx 


(oe) 
so that Z| X |" is finite or infinite according as i) x’--7h(x«) dx is finite 
a 


or infinite. Since, given 6 > 0, for x 2 a, 
Li fam ee x7—1—- 1h (x) < aryl 
it follows that E| X |" < © when lim «"-7-> < » and E| X |" = ~ 


when lim x7-7-§ = ©, 


T-» C 


IfO0<r<y S 2 then there is a positive 6 < y — 7, the first limit is 
finite, E | X |" < © and (1) is proved. 


If r > y with y < 2 then there is a positive 6 < r— y, the second 
limit is infinite, Z| X |" = © and (ii) is proved. It remains to show that 
when £(X) = £, with O<y <2 then Z| X|v = ~. Since, by 3°, 


> 


x7q(x) > ¢ > 0,7 ~19(x) ~ ex for x > © so that i) xI-19(x) dx = @ 


E | X |? = © and the proof is concluded. 


§26. RANDOM WALK 


Random walks—sequences of consecutive sums of 1id summands, are 
present, in various guises and various degrees of generality, in an in- 
credibly huge literature of applications of pr. theory to a very large 
number of concrete problems: queuing processes connected with mass 
service, dams, waiting times, renewal processes connected with storage 
and inventories, risk theory, traffic flow, particle counters, and many 
others. The present general random walk theory is relatively recent. 

In 1921, Polya discovers “recurrence” and “‘nonrecurrence’ phe- 
nomena in his study of some simple random walks on lattices in R, R’, 
and R?, Thirty years later, in a definitive work, Chung and Fuchs 
settle this dichotomy problem for general random walks. Fluctuation 
r.v.’s defined on the z first terms of the random walk appear in the 
concrete problems mentioned above. But it is only in 1949 that Ander- 
sen begins his investigations into these r.v.’s for the general random 
walk. Since then a large number of results were obtained by many 
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authors. They use variants of either the combinatorial or the analytic 
methods. 


The combinatorial method initiated by Andersen threw the doors 
wide open. His approach was very involved. Spitzer simplified and 
unified the combinatorial approach and obtained some of the most im- 
portant identities and limit theorems of the theory. His book, while 
devoted to random walk on lattices only, contains a number of deep 
ideas and significant examples. Feller, using ladder indices and ladder 
variables, first introduced by Blackwell, reduced the combinatorial ap- 
proach to elementary mathematical arguments and using Feller’s ap- 
proach, Port, in a semi-expository paper, obtained a large number of 
known identities and generalized some of them. 


The analytic method, as used by Pollaczec since 1930, was very 
involved and his work remained unnoticed until some of his results 
were rediscovered. Ray, Kemperman, Baxter, Wendell, . . . , simpli- 
fied and unified in various ways the analytic approach and obtained 
further identities. Kemperman’s book presents in detail the approach 
based on Liouville’s theorem (already used by Pollaczec) and contains 
a large number of examples. Baxter uses a method based on Fourier- 
Stieltjes transforms and operators on functional Banach spaces. Wendel 
introduces and investigates ‘‘order statistics” of (Sj,- ++, Sa)... 

No attempt will be made here to apply the general random walk 
theory to concrete problems. The interested reader will find in Feller’s 
two volumes a large number of such problems. 


26.1 Set-up and basic implications. A sequence § = (5), So, + + °) 
of r.v.’s is called a random walk (on R) if the sequence of its random steps 
X = (X, = Sy, X2 = Se — Si,- + +) at times n = 1, 2,- - - consists of 
tid r.v.’s Xi, Xe,- + + . A random walk determines the sequence of its 
random steps, and conversely; similarly for the sub o-fields of events: 


Cr = G(X, ° _ os > Xn) = @(Si,. eo »Sn)s 
Cr = B(Xna, X nt2y ** ‘) = B(Sn+1 — Sr Sn42 _ S'ny — -). 


We denote by @a = @(X1, X2,- + -) the smallest o-field generated by 


the field UJ @, and@ = A) C, is the tail o-field of the sequence X; it is im- 


n=l n=l 


Loe) 
portant to realize that, in general, € is not the tail o-field (1) (Sn4iy Sn425 
n=1 


- -) of the sequence S. 
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We shall frequently adjoin Sg = 0 to the random walk so that it will 
become (So, Si, S2,- + +) with steps X, = 5S, — Spray mn =1,2,°-°. 
Intuitively it means that the random walk starts at time 0 at the origin. 
We could also make it start at some x © Ror choose Sp to bear.v. If the 
random steps obey a law £(X) with only values + nd, d> 0,2 = 0, 
1,- - + , then we have a very simple Markov chain with countable state 
space, 0, +d, +2d,--- , and initial position 0, or some mod, or a r.v. 
with law £(X). It is strongly recommended that the reader interpret 
the corresponding concepts and results in III of the Introductory Part 
in the case of random walk theory, as found in this section. 


The common law of the random steps will be denoted by £(X), its 
d.f. on R and corresponding pr. distribution on the Borel line will be 
denoted by the same symbol F, and its ch.f. will be f. D.f.’s and cor- 
responding pr. distributions of their sums S,-“‘positions”’ of the random 
walk at times 2, will be denoted by F, and their ch.f.’s are /*, 2 = 1, 
2,--+. If &(X) degenerates at 0 then the random walk stays a.s. at 
{0}; from now on we exclude this trivial case. Note thatif £(X) degenerates 
at a ~ 0 then the random walk moves a.s. by degenerate steps a from 


nato(z+1l)a,n =1,2,--- ,and S.—> + » or Sy——> — » accord- 
ingasa > Oora < 0. 


We distinguish two types of common laws £(X). Let La = {ud:n = 0, 
+1,-+2,-- +} bealatticeof spand>0O. Wesay that X 1s Ly-distributed 


+00 
if >> P(X = nd) = 1 and there is no lattice of larger span d’ > d with 


this property; according to the remark following 14.1a such a distribution 
occurs if and only if | f(z)| = 1 forsomeu #0. If there is nod > Osuch 
that X is Lzdistributed, we set d = 0, Lo = R, and say that X is Lo- 
distributed; thus X is Lo-distributed if and only if |f(#)| < 1 forallu #0 


We now examine basic implications of the above set-up. 


PossIBLE VALUES AND STATES. We say that xe R is a possible value of 
ar.v. X if P(X € V,) > 0 for every neighborhood V, of «. We say that 
x is a possible state of the random walk S = (5), S2,- - -) if for every 
given neighborhood V, of x there is an 2 = n(V;) such that P(S; € Vz) 
> 0. In either case, it suffices to consider neighborhoods of the form 
V,= (« —e X +6). Let II, denote the possible states of the random 


walk S, let II, be the set of possible values of S,, and set II, = UW In. 


n=] 
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a. II, contains IL, and ts closed. 


Proof. The first assertion results at once from the definitions. As 
for the second one, since *%m— * as m—> © implies that, given e > 0, 
for m sufficiently large (%m — 1/m, %*m +1/m) C (x —6 «+ 0), it 
follows that when the x», are possible states there is an ” such that 

P(|Sn — x| < €-) = P((Sn — X%m| < 1/m) > 0. 
We say that « is a discontinuity value of X if P(X = *) > 0. Clearly, 


the set of discontinuity values of X is the set of jumps of the discon- 
tinuous part of F x? of the df. Fx. 


b. If x and y are possible values of independent r.v..s X and Y respec- 
tively, then x + y 15 a possible value of X + Y. 

If « and y are discontinuity values of independent r.v.’s X and Y respec- 
tively then x + y 15 a discontinuity value of X + Y, and all such values 
of X + Y are of this form. 


The first assertion obtains by 
P(\X+Y-—(wt+y)| <6) = P(X —«| < €/2) X PUY —y| < €/2) >0 
and the second one results from 
(Fy x Fy)? = Fy? « Fy. 


A. PossIBLE VALUES THEOREM. Let X be Lg-distributed with d = 
(i) Lf netther X =O as. nor X $0 4.5. then when d> 0, I, = 
and when d = 0, Il, ts dense in Lo = R. 
(i) Lf either X = Oa.s.or X S Oa.s. then when d > 0, from some n on, 
nd or —nd, respectively, belong to Il, and when d = O, for every given e > 0, 
from some x > O, II, intersects (x, x + €) or (—«% — €, —x), respectively. 


0. 
La 


Proof. We use b without further comment. We can assume that 
S; = X; has a positive value a so that S, = Xi + X2 has positive 
value 2a; otherwise, we change X into —X. Thus, it suffices to prove 
the theorem when there are positive values 4 < 4. We follow Feller. 


1% Set J, = [na, nb). Forn = m > a/(b — a), [na, (2 + l)a) C Jn 
hence UJn = [m14, ~) and every « 2 ma belongs to some of the J, for 
n2n, 


n= ny. Since the 2 + 1 points ma + k(6 — a), k = 0, +++ , 2, belong 
to II, and subdivide J, into intervals of length  — a, every « 2 mais at 
a distance at most (4 — a)/2 from a member of II). 


2°. Suppose that for every given e > O there are possible values 
(0<)a<bwith —a<e. Then X is Lo-distributed for otherwise 
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every II, hence II, is contained in Z, for some fixed d > 0 and we reach 
a contradiction. 

If X = Oa.s., the assertion in (#2) for d = 0 follows from 1°. 

If neither X = 0 a.s. nor X < 0 a.s. then X has a possible value 
c <0. Given e> 0, it follows from 1° that for arbitrary « and suffi- 
ciently large 7 there is ay € II, belonging to (—c + *, —nc + + @). 
But y + zc also belongs to II,. Thus, every interval of any given length 
e > Ointersects II, so that II, is dense in Zp) = R and the assertion in (2) 
for d = 0 is proved. 


3°. Suppose now that whichever be the possible values (0 <)a < 4, 
there is an e > O such that d — a = e; we may assume J — a < 2e for 
some a and 4. Then the set J,I1, consists of points ma + k(6 — a), 
k=0,°+-+,m. Since (x + 1)a is one of them, they all are multiples 
of — a. But for any ¢c € Ih, for 2 sufficiently large J, has a point of 
the form c + k(6 — a) so that c is also a multiple of  — a. Thus X ts 
La-distributed with some d > 0 and the proof is completed. 


Corotiary. Let X be La-distributed with d 20. If neither X 2 0 


nor X <0 then the set of all possible states of the random walk coincides 
with La. 


Follows at once by a. 


From now on, we take for 2 the set Q = R® of all numerical sequences 
% = (x1, Xe, ° » °) and for the o-field of events the o-field @ of Borel sets 
in R®, that is, the o-field generated by the class of all cylinders of the 
form C(41 X +++ X An), nm = 1, 2,*+ ++ , where the 4’s are linear 
Borel sets. This choice does not restrict generality yet permits to avoid 
possible ambiguities, say, about “‘translations.” 


SLEIN and 0-1 laws. 
According to 17.4.4° 


ForO <r<2,—] - > (XxX, — a,)—> 0, with a, = 0 or EX according 
k= 
asr<lorr=1, if and only if E|X|" << @. 


For r = 1, we have Kolmogorov strong law of large numbers, SLILN 
for short, which can be completed as follows (see also 34.4). 


B. SLLN. Let EX exist. Then S,/n——> EX. Conversely, if 


a.8. . . . . 
S,/n—— ¢ necessarily a constant (finite or infinite) then EX = c. 
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Proof. It suffices to complete SLZN by considering the cases of 
infinite EX and c. 
Let EX = +o, thatis, EX+ = +, EX- < o and let X,(2) = X; 


or a € R according as X, < aor X, 2 a. Set S,(a@) = > Xz(a). Since 
EX(a) < », i 
S,/n = S,(a)/n ——> EX(a) 


hence, letting a f ~ so that, by monotone convergence theorem, 


EX(a) } EX = +, we obtain S,/n——> EX = +o; similarly for 
EX = —~, or change X into —X. 


For the converse, if c = + © then, by what precedes, 
Ex+—1 x x,t <1 > Xx, +i y Xf @ + EX 
M1 k=l NM k=1 nM k=1 


so that EX+ = +o, hence EX = + since EX exists; similarly for 
c¢ = —o,orchange X into —X. 


SLNN utilizes fully the iid property of the summands. Independence 
alone yields as we know (16.3B). 


KOLMOGOROV ZERO-ONE LAW. On a Sequence of independent r.v.’s tatl 
events have for pr. either 0 or | and tail functions are degenerate. 


This zero-one law, while applying to X = (Xi, Xe, - + +), does not 
apply to the random walk S = (Sj, S2,+ ++). Yet, the ééd property of 
the summands implies “‘exchangeability’”’, and a new zero-one law will 
apply to S: 

We say that a sequence X = (Xi, Xo, ++ +) of r.v.’s is exchangeable 
or that the r.v.’s Xi, Xo, ++ + are exchangeable if the distribution of X 
is invariant under all finite exchanges of its terms or, equivalently, of 
their subscripts; in symbols, for every 7 and every one of the 2! permuta- 
tions a, of (1, + + + ,#) into (ki, + + + , kn), 


&(X) = L&(@nX) = £(Xi, vy Xkns Xntly Xn425 a -) 


We say that a measurable function g(X) is exchangeable if it is invari- 
ant under all permutations w, of its arguments: g(@,X) = g(X),2=1, 
2,--+ in particular, an event on X is exchangeable if its indicator is ex- 
changeable. Clearly, on X every tail event and every tail function are 
exchangeable. In fact, by the iid property of its terms, X is exchange- 
able while, for every 7, the sequences (Sn, Sn4i, °° *) are invariant under 
permutations @, of (1,---, ). Thus, the second assertion below fol- 
lows at once, while the first one results directly from the definitions: 
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c. (i) On X, exchangeable events form a o-field & and exchangeable func- 
tions are &-measurable. 


(ii) On the random walk 5 corresponding to the sequence X of iid steps, 
tail events are exchangeable (belong to &) and tail functions are exchange- 
able (are &-measurable). 


In general, tail events and tail functions on S, say [S, © Ap i.0.] where 
A, are linear Borel sets, liminf S,, limsup S,, while exchangeable, are not 
tail events on X and Kolmogorov zero-one law does not apply. Yet 


B. HEWITT-SAVAGE ZERO-ONE LAW. On a sequence of tid r.v.’s ex- 


changeable events have for pr. either 0 or 1 and exchangeable functions are 
degenerate. 


To prove this theorem we require an elementary measure-theoretic 


proposition. Let ZA B = AB*+ AB. 


d. APPROXIMATION LEMMA. Let (Q, @, P) de a pr. space. If afield D 


generates @ then for every given A € @ and every e > OQ thereisa DED 
such that P(A AD) S «. 


For, clearly, the class of all sets 4 € @ with the asserted property 
contains D and it is easily verified that this class is monotone; thus, by 
1.6A, it coincides with @. 


The approximation property can be restated as follows. Let 7€ @ 
ande, | 0. There are D, © D such that P(4 A Dn) S e, > 0,7 that is, 
P(AD,°) ~0 and P(4°D,) ~0. Therefore, PD, — PA since PA = 
PAD, + PAD,° = PD, — PA°D, + PAD,*. 


Proof of B. In our case D = U @B, so that, given an exchangeable 
event 4 (in fact, any event) there is a sequence B, € @;, with 


P(AA B,) —0 hence PB, — PA; we can and do select ki < kg <--- 
Let C, be the events obtained from B, by the permutation of (1,-- + An, 
kn +1,-++,2k,) into (ka + 1,-++ ,2kn,1,+-+ kn); thus, 8. € Br, = 
@(X,---, Xz,) implies C, € Cr, = B(Xa,41, Xinpe'++) and, Gx, 
and @;,,, being independent so are B, and C,. But this permutation leaves 
the distribution of X invariant while 4, being exchangeable, remains the 
same and 4A B, is changed into 4A C, so that 

P(AAC,) = P(AAB,) > 0 
hence PC, — PA; also 


P(AAB,C,) S P(AAB,) + P(LAC,) > 0 
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hence P(B,C,) — PA. Therefore, B, and C, being independent, 
PA<P(B,C,n) = PB,- PC, — (PA)? 


so that P4 = Oor1. The first assertion is proved and the second fol- 
lows. 


Consequences. 1. P[S, © An i.o.] = 0 or 1, liminf S, and limsup 
Sn are degenerate. 


2. THREE ALTERNATIVES. For a (nondegenerate at 0) random walk 
(S1,S2,-- +) there are exactly three asymptotic alternatives: 

(i) S,; > —© (drifts to —@) 

Gi) S, ~> +0 (drifts to +) 

(iii) —o = liminf S, <limsup S, = +@ a.s. (oscillates between 
—o and +o), 


Proof. Since liminf S, = ¢ a.s. where the constant ¢ may be finite or 
infinite, and (S_, — S,, S3 — S1,---) has the same distribution as (8), 
S2,*++), we have 

liminf (S,, — X;) = liminf S, as. 


hence c = Xi +c a.s. The case Xi = 0 a.s. being excluded (that is, is 
excluded the trivial alternative the random walk stays at 0 a.s.), we must 
havec = +o orc = —o, Thusas. 


either liminf S, = —© or lim S, = liminf §, = +o 
and, changing X into —X, a.s. 
either limsup S, = +o or lim S, = limsup §, = —o. 

The three alternatives assertion follows. 

RANDOM TIMES. 

Translations 0" on X = (Xj, Xo,---) are defined by 

aX = 6"(X4, X2z a -) = (Xn Xn425 a ) n= 1, 2, ‘es 

so that _ _ 

the terms of 6"X are tid with same common law &(X) as the terms of X. 

Thus, 6*X has same distribution as X and therefore X is said to be 
stationary (see also 33.3). The random walks corresponding to X and to 
6"X are, respectively, (51, Se,- ++) and (Snai — Sny Sn42 — Sny+ ++) with 


same distribution, and the o-fields @(51,---, Sn) = Brand B(Sn41 — Sn, 
Snae2 — Sa-++) = B@CXnsu, Xn42,°°*) = Ca are independent. 
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These properties extend to “random times”—times 2(=1, 2,--+-) be- 
coming “degenerate” random times, as follows (see also 39.2 and 41.4 
taking therein T = (1, 2,---) and d = &), 


Given a nondecreasing sequence (@,) of sub o-fields of events, a 
measurable function 7 to (1, 2,---, ©) is a (@,)-time if [r = 2] € Ba, 
nm = 1,2,---+ if there is no confusion possible, we say that 7 is arandom 
time. Clearly, a random time 7 is @,-measurable with o-field ®, = 
{events B: Alr = 2] CB, n= 1, 2,---}. If r < © a.s., we define 
Xr+z(w) by Xr~)4n(w) so that the X74, are r.v.’s, k = 0, 1,---. Then 
the o-field @(X741, Xr42,° °°) is denoted by ©, and translation by 7 of X 
is defined by 


O°(X1, Xo, - ++) = (Xr41, Xetey ++). 
The above properties of translations by 7 remain valid as follows. 


C. RANDOM TIMES TRANSLATIONS. If @ (@n)-timet < © a.s. then the 
o-fields 8, and ©, are independent and the sequences X = (Xi, X2,- °°) 
and 0X = (Xr41, Xr42,°° +) have same distribution. 


Proof. The assertions mean that, for any pair of events B; € @, and 
BE Baw = B(X1, Xo, _ )s 


(1) P(B,[o'X € Bl) = PB,P(X € B). 
By definition, 67 = 6" on [r = m] hence 
P(B,[6*X € B)) = > P(B,[r = nllorX € B)). 
Since B,[r = 2] € ®, and 6"X is C,-measurable, independence of ®, and 
C, implies 
P(B[r = n\[orX € Bl) = P(B,|r = nj) - P(X € B). 
Since LPC =n) = 1, and 6"X has same distribution as X, (1) becomes 


P(B eX € B)) = > P(B.[r = nj) - P(X € B) = PB, - P(X € B) 


and the proof is terminated. 


The above argument is characteristic of extensions of properties of 
times 7 to random timesr < © a.s.: use the definitions and the asserted 
property—valid on [r = nw], m = 1, 2,---. For example, upon setting 
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m1 = 7 < © as. then defining 72 by 72 = 7 but on 671X in lieu of X, and 
so on, it easily follows that 


1. Sn Srtre _— S15 . ee are 11d 1.0. S. 


By the same procedure but with £ in lieu of P, additivity of expecta- 
tions, which for random walks becomes ES, = nE.X, extends tor < © 


a.s. in lieu of 7, upon using Er = y nP(r =n) = y P(r 2 n) as fol- 
lows. 


2. WALD’s RELATION. ES, = Er- EX tn the sense that tf the right side 
exists so does the left one and then both are equal. 


Note that the right side exists when Er < © and EX exists, or when 
Er = o» and £X 1s finite or EX 2 Oor EX S 0. 


Proof. LetO S EX S ~ and Fr S$ ~, Then 


ES, = > E(8,[r = n]) = > pb» E(Xi[r = n)) 


=> > E(X%ilr =n) = » E(Xilr = 2] 
— > EX,Plr = k] = EX- Er. 


The last but one equality is due to the fact that [r < &] belongs to 
@®z-1 hence so does its complement [r 2 &], while X; is C,a(=@CX%%; 
Xz41,°**))-measurable, and ®,-1 and @;,-; are independent. 

Changing X into —X the same relation holds. The other cases follow 
from EX = EX+ — EX~- with EX+ or EX finite. 


We shall frequently encounter the Aétting or first visit time ta of a 
linear Borel set 4 by a random walk (8), S2,---): 
ta(w) = min{nz: S,(w) € 4} forw € U[Sa & A] andra(w) = © oth- 


erwise. Clearly 74 is random walk time, since for every 7, 
[ra = 2] = [S, € A’ fork <2, Sn € Al € BS, - ++, Sn). 


Similarly for other random times we shall encounter: In general, the 
fact that they are random walk times will be clear from their definitions. 


ANDERSEN EQUIVALENCE. 


“Finite exchangeability” alone suffices for a basic Andersen result for 
“finite fluctuations.” We set Xn = (Xi,---, Xn) and say that the ran- 
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dom vector X, is exchangeable or that its components Xj,--+-, Xn are 
exchangeable if the distribution of Xj is invariant under the 7! permuta- 
tions of its components. We say that a measurable function g(X,) = 
g(X, +--+, Xn) is exchangeable if it is invariant under the 2! permutations 
of its arguments. 


e. ANDERSEN EQUIVALENCE LEMMA. Let Xi,---, Xn be exchangeable 
and let So,--+, Sn be their partial sums Sy = 0, Sy = Xi,-++, Sn = 
Xy+ee+ + Xn. 

If vn ts the (random) number of positive terms in (So,+ ++, Sn) and Tp ts 
the (random) time of occurrence of the first maximum of its terms, then vn 
and T, are tdentically distributed. 


This result is an immediate consequence of a combinatorial lemma 
due to Feller whose elementary proof, modified by Joseph—as reported 
in Feller, follows. 


f. CoMBINATORIAL LEMMA. To each permutation (Xt ***) Xk_) of 
(W1,° ++, ¥n) associate the sequence O, Xz, +++) Xk, +++ + Xk, Of its partial 
sums. Letm =0,1,°--, 2. 

The number Nm of permutations with exactly m positive sums ts the same 
as the number T,, of permutations in which the first maximum of partial 
sums occurs at time m. 


Proof. Let Nm and Tm correspond to Nm and Tm when x, is omitted 
in (%1,°° +, %n). We use induction: The assertion holds for 7 = 1 since, 
clearly, x. S O implies No = To = 1 and N,; = TJ; = 0 while x, > O im- 
plies No = To = O and Ni = JT, = 1. Suppose it holds for x — 1 2 1, 
that is, Nx = Tm: fork = 1,-+-,m and m = 0,---, 2 — 1; since trivi- 
ally Naz = Tn, = 0, it also holds for m = n. 

We use the fact that by fixing x, and permuting the 2 — 1 remain- 
ing x’s then varying k = 1,---, ”, we obtain the ~! permutations of 
iy *y Xn) 

If s, $0 then Nw and Ti, depend only on *,-+-+, ¥,-1 hence, by 
induction hypothesis, 


Nm = 2 Nik = Do Tink = Tm 
k=1 k=] 
Ifs, > Othen N,, = > Nm—1,x. As for Tm, consider all (xz, “i, °° *) Mkn—1) 
kat 


starting with x,. Since x, +--+ + #:,_, > 0 the maximal terms of 
their partial sums cannot be so. Since the first maximum occurs for 
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m(=1,---, 2) if and only if the first maximum of partial sums of (*:,, °° °, 
z,-1) occurs for m — 1, we have 


Nm = > Nm—1,k = > Tm—1,k = Tm 
k=1 k=1 


By using an argument formulated by Spitzer, instead of proving e we 
can prove the more general. 


D. EQUIVALENCE THEOREM. Let g(Xn) be an integrable function of an 
exchangeable random vector Xn = (Xi, °° +, Xn). 
If g(Xxn) is exchangeable then, fork = 0,1, -+-+,%, 


E(g(Xn)I pen) = E(g(Xn)Lteq=t)3 
in particular, 
E(eSIy,-14) = E(e*S-Ip,-1), uC R, 
and 
Ploy = k] = Plta = &]. 
Proof. Let F, be the d.f. of X, and Xn = (#1, +++) %n). 


Denote by Z summations over the 7! permutations w, of (1,---, 7). 
Since g(X,) is exchangeable 


E(g(Xn)Lty,=1) = Bs J gl0)Zejau ont a)dF uC) 


and, by the combinatorial lemma, 
LL [ya=h] (DnXn) = Dl [tn=k] (@nXn). 


Thus the first sum equals the same sum but with r, in lieu of », hence the 
expectation equals the one with 7, in lieu of vz. The particular case with 
g(X,) = eSn follows and then, setting w = 0, the last assertion—which 
is that of e, results. 


By means of his equivalence, Andersen obtained his first limit theorem 
for finite fluctuations, namely 


ArcsiInE LAW. Let So(=0), Si,:++ be partial sums of tid summands 
X1, X2,°++ with common law &(X). 
If &(X) is symmetric with P(X = 0) = 0 then 


P(v,/n < *) = Arcsin, OsvSsl. 
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The Arcsine law was discovered by P. Levy in his study of Brownian 
motion, then obtained by Erdos and Kac as a limit theorem for sums of 
independent random variables with finite second moments and obeying 
Lindeberg condition (see also Chapter XII). Andersen’s result which 
does not require second finite moments was unexpected and drew atten- 
tion to his approach. 


The proof is based upon the following considerations. The event 
[v, = k] consists in the occurrence of events [S, > So, -++, Sp > Sk-al 
and [Sii1 — Sy S0,---, S, — Sy SO]. The first one belongs to @(X, 
- ++, Xz) and the second one belongs to @(Xx41, ---,; Xn) and these two 
g-fields are independent. Furthermore, (Xz11, °- +, Xn) is distributed as 
(X4,°°+, Xn-z). It follows that 


PQ, = k) = PQ, = k)P(n-« = 9) 
and, by Andersen equivalence, 
(1) P(t, = k) = P(te = k)P(ta-% = 9). 
Let 


1 (2k)! Q@—k))! 


Pnlk) = oe Eel GBI BD! 


k=0,-°+, 7, 
so that 
Dnlk) — Drln —_ k), 2s Prlk) = |, 
We prove by induction that 
(2) P(vn = k) = prlR). 
For 2 = 1, we have 
1 
Py = 0) = Ptr = 1) = 5 = 710) = pill). 


If (2) holds for 2 — 1 hence, by (1), P(vn = k) = p,(k) fork = 1,-->, 
n — 1, then 


Pv, = 0) + Pl = 2) 
=1—E Plo =k) =1— E palk) = pal) + Pal). 


Since the hypothesis about £(X) implies easily that P(v, = 0) = 
P(v, = n), it follows that 
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P(vn = 0) = pa(O) = Pn = 2) = Pal). 


Once (2) is proved, the Arcsine law follows by elementary computations 
using Stirling’s formula (see Introductory Part, 7). 


We shall use the foregoing basic implications without further com- 
ment. 


26.2. Dichotomy: recurrence and transience. We recall that x e R 
is a possible state of a random walk (Sj, Se, - - -) if for every neighborhood 
V,, there is an x = n(V,) such that P(S, € Vz) > 0. We say that x is 
a recurrent state of the random walk, if, for every V,, P(Sn € Vz i.o.) = 1; 
as usual “‘i.o.”’ stands for “infinitely often,” that is, for infinitely many x, 
and “‘f.o.” for “finitely often”’ will stand for denial of “‘i.o.”, that is, for 
“at most finitely many 7.” Thus, to say that x is recurrent is equivalent 
to P(S, € V, f.o.) = 0. Clearly, a recurrent state is possible and it suf- 
fices to consider neighborhoods V,, of the form (« — ¢«, x + €). 


a. If a random walk has a recurrent state x then all possible states are 
recurrent. 


Proof. If y» is a possible state, that is, for every e > O there is a 
k = k(e) such that P(| S, — y| < €) > 0 then x — y is recurrent: For 
then, 


0O= PS, — «| < 2e f.o.) 
= P(| S,—y| <6] Snyz — Si — (* — y) | < € fio.) 
= P(| S, —y| <&)P(| Sn — (ww —y)| <e fio.) 


hence P(| S, — (« — y)| < fio.) = O and x — y is recurrent. It fol- 
lows that every possible state y = x — (* — y) is recurrent and so is 
x—x=0. 


Thus we are led to a dichotomy: A random walk is recurrent if one 
hence all its possible states are recurrent, or it is transient if none of its 
possible states is recurrent. 


As usual, £(X) denotes the common law of the iid random steps 
X1, X2,*** which generate the random walk. 


A. RECURRENCE THEOREM. Let X be La-distributed with d = 0. 
The random walk is recurrent if and only if one of its possible states is re- 
current, and then La ts the set of its states. 


Proof. Uf the set ® of recurrent states is not empty then, by a, the 
random walk is recurrent while the converse is trivially true. 
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Let the random walk be recurrent. Always by a, is closed under dif- 
ferences andO0€@. It follows thatx CRS — x = 0 — weR, and R is 
an additive group. Furthermore, ® is topologically closed since, for any 
given V,, if recurrent *,—% then, from some 7 on, *, € V, hence 
P(S, € V1.0.) = 1,andxis recurrent. Since the trivial case of random 
walks degenerate at 0 is excluded, R ¥ {0} and the only foregoing sub- 
groups in R are of the form ® = La with d’ 2 0. Ifd’ = Othend = 0. 
If d’ > O then La C Lahence d S d’. Supposed < d’ so that there is a 
possible state which is not recurrent. This contradicts the hypothesis 
that the random walk is recurrent. Thus d = a’, and the proof is termi- 
nated. 


Corouiary. Let X be La-distributed with d = 0. 
Either P(S, € Vz 1.0.) = 1 for all bounded open sets V intersecting La, or 
P(Sn € Vez i.o.) = 0 for all such V. 


B. DicHotomy criTERION. Let X be La-distributed with d = 0. 

Gi) Jf > P(Sn © J) = © for some bounded open interval J, necessarily 
intersecting La then the random walk 1s recurrent. 

(ii) Jf > P(Sn € J) < © for some bounded open interval J intersect- 
ing La, then the random walk ts transtent. 


Proof. By Borel—Cantelli lemma, the hypothesis in (i) implies 
P(S, € J i.o.) = O for some bounded open interval J intersecting Zag so 
that there is a possible state which is not recurrent hence, by A, no state 
is recurrent and the random walk is transient. 


Let >> P(S, € J) = © for some bounded open interval J with length 


n=1 


|J|. Then, for every « <|J|/2 there isa J. = (@w—e,x te) CJ 
such that >> P(S, Jz) = ©. Consider the time 7 of the last visit 


n=1 
by the random walk to J, if any, and set 7 = Oif none andr = © if in- 
finitely many. Thus, for k = 1, 2,--- 


A, = [r =n] = [Sa © Joy Snaz Z Je for all &], x = 1, 
and 


Ay = |r = 0] = P(S, £ Je for all n), 
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hence 
P(r <o) = P(S,z C Jz fio.) = + PA. 
n=0 
Since, for m 2 1, 
PA, = P(Sa © Jay |Sn4z — Sn| 2 2e for all &) 
= P(Sn © J2)P(|Sn4z — Sal 2 2e for all &), 


it follows that 


1 > P(S,€ Jn f.0.) = P(|S,| = 2e for all A)  P(Sa € Ja). 
n=1 


[oe] 


Thus, 5> P(S,n © Jz) = © implies that for every e > 0 


n=1 
(1) P(|S. |= 2e for all k) = 0. 
This relation implies recurrence of 0 hence of the random walk, as fol- 
lows. 

Take Jo = (—e, +e), let Js; = (—6, 6) with O < 6 < e, and define the 
corresponding 4,° as the 4, were defined but replacing x by 0. Note 
that, by (1), PA4.° = P(| S.| = ¢ for all 4) = 0. In fact, all P4,° = 0 
form 21: For, as 6 7 ¢, 


Ars = [Sn Cc Js; Sntk T Jo for all k| T A, 
hence P4,°,3 > PA, and, by (1), P4,° = 0 since 


P(S, € Js; Sn+k T Jo for all k) 
< P(Sa € Jay [Snax — Se] 2 € — 6 for all &) 
= P(S, € Js)P(|S,| 2 ¢ — 6 for all k) = 0. 


Thus, 
P(S, € Jo f.0.) = 5 PAY =0 
n=0 


so that 0 is recurrent, and the proof is completed. 
Corotuary. Jf for some bounded open interval J intersecting La 


>, PCS,» € J) ts either infinite or finite, then the same holds, respectively, for 


n=] 
all such J. 
The elementary proofs of A and B are the original ones and are due to 


Feller, while the proof of C is due to Chung and Ornstein and that of D 
is due to Chung and Fuchs as modified by Feller. 


384. INDEPENDENT IDENTICALLY DISTRIBUTED SUMMANDS [Sec. 26] 


The next proposition provides us with a dichotomy criterion in terms of 


one numerical characteristic of £(X), namely in terms of EX provided it 
exists. 


C. ExpEcTATION CRITERION. Let EX exist. Then the random walk ts 
recurrent if and only if EX =0. More precisely 
(i) If EX = 0 then the random walk is recurrent and 4a.s. 
—o = liminf S, < limsup §, = +°. 
(ii) If EX >0 or EX <0 then the random walk is transient and 
S_ > + © or S, > —&, respectively. 


To prove this proposition we need the lemma below; we introduce 
Sy = 0 and write J(4) in lieu of Z4 for any event 4. 


b. For every c > 0 and every integer m 


SPS] < me)S XD P(Sal < 0. 
m n=0 


m=0 
Proof. Let the right side be finite; otherwise there is nothing to prove. 


Let J be an interval of lengthc andlet»y = >° J(S, € J) be the number 


n=1 


of visits to J by the random walk (51, 52,---) so that their expected 
number is Evy = > P(Sn € J). Set 7 = min{fz 21: Sz J} when 


n=1 


this set is not empty and r = © when it is; 7 is the time of the first visit 


jee) 


to J and Evy = > EQl(r = n)). On[r = a], [0S © J) = Ofork <n 


n=1 


while Z(S, € J) = 1 hence 


k=n+1 


=14 YF US, — S$) +n EJ) 
k=n+1 
S14 5 Mu — SI <d =14+ 5 Ml <0) 


= > I(|Sz| < c). 


k=0 
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Since [r = z] and S; are independent when k > a, it follows that 


n=0 k=0 


Evs > {PG = n) > P(|S;| < a} < > P(|Sn| < c). 
Therefore, 


> PEN SF Pill <0) 


n=0 n=0 


since P(So € J) = O unless 0 € J when this inequality holds trivially, 
term by term. Upon replacing J by J; = [yc, (j + 1)c) and summing 
over j = —m, —m-+1,---,m — 1, the asserted inequality 


a ¥ PUlS| < me) SE PllSal <0) 


n=0 n=0 


obtains. 


Proof of C. By the SLLN, if EX > 0 then S,/n “> EX > 0 hence 


S,—> + © and the random walk cannot be recurrent; similarly for 
EX <0. 

Let EX = 0s0 that S,,/n "> 0 and, a fortiori, S,/7 —— 0 hence, for 
given e > 0 and z = », sufficiently large, P(|Sn| < me) < 1/2. There- 
fore, for m/e = Ne, 

1 m 


An r P(|Sn] < _m) 2 am — ne\/2m = 1/4e — n,./4m 


2m ~~ € 


so that, by b with c = 1, 


> P(\Sal < 1) 2 limsup(1/4e — 2./4m) = 1/4e—> © 


n=0 m->co 


as e— 0, B applies and the random walk is recurrent. But then the 
(nondegenerate at 0) random walk cannot drift to -+-© or to —© and 
the only asymptotic alternative is a.s. —© = liminf S, < limsup S, = 
+o, The proof is terminated. 


If a random walk obeys the infinite oscillations alternative it is not 
necessarily recurrent: Symmetric random walks, that is with £(X) sym- 
metric, obey this alternative and we produce now such random walks 
which are transient. 
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Let £(X) be nondegenerate symmetric stable, that is, with f(uw) = 
exp(—clul’), ¢c >0,0<yS2. According to end of 25.2, F’ exists and is 
continuous and 0 S F’(«) S F’(O) with F’(0) > O. Furthermore, £CX) 
being stable, 


&(S,/nV7) = &(X) for every nm = 1,2,---. It follows that 
P(\Sal < 1) = P(X| < 1/27) 


n lly 
= f F'(x) dx ~ 2F'(0)n-"7, 


al Y 


so that >~ P(|S,| < 1) is finite or infinite, according as >> m7 is 


n=1 n=1 
finite or infinite hence, according asO < y <lorl Sy S$ 2. Thus, by 
B, our symmetric random walk is transient for 0 < y < 1 and recurrent 
for 1 = y S 2; note that EX does not exist for0 < y $1. 


Finally, we search for conditions for recurrence or transience in terms 
of the ch.f. f of £(X). (So far, they seem to provide the only approach 
for general random walks in euclidean spaces R", x > 1.) In what fol- 
lows we use the immediate 


PARSEVAL RELATION: f f (uw) dFy(u) = J fy(X) dFx(X) 
which obtains upon integrating fx(u) = f e*“* dF(«) with respect to Fy(/), 
and two laws with 


triangular pr. density: 


1— 
F(x) = (1 _ lei) V 0, f(u) = 2 eh > 0, 


triangular ch.f.: 
_(y ll 7) _ L1— cos hx 
fw) =(1 5 V 0, F’(u) =~ »h>0. 
D. Cu. r.’s anp picnotomy. Letf be the ch.f. of the common law £(X). 


(i) The random walk is recurrent tf there is a6 > 0 with 


6 
limsup du _ 
tT1 Ys 1— fe) 
(11) The random walk is transtent if there is a6 > 0 with 


6 du 


sup 1- fw ~ 0O , 


O<f<l J_; 
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Proof. Let So = 0. 


1°. Parseval relation with triangular ch.f. and Fs, yields 


h 
Piss <2 J (1- ll) a s,(x) = Lf La 008 Bi ay dt 
Since (1 — cos 4u)/hu? = ch for |u| < 1/4 and some c > O and 


ee ote 
T= F@ = T= FOP? 


it follows that 
© p l/h 1 ; 1h 7 
“PIS, =f 1 -2f _ du 
x! P(|S2| < 4) = 7 Jas Res Fa du t Jan Tafa) 
Therefore, by hypothesis in (i), for 1/h < 4, 
” _ chy. du _ 
x PCS <A)= - limsup |, I=f@o 


and recurrence obtains by B. 


2°. Parseval relation with triangular pr. density yields 


h 
1 — cos hx 1 
fis ars@ =f (1-!) ow ae 
so that for |x| < 2/4 hence (1 — cos hx)/h2x® > 1/3 


2 EPS |<2/A) < f (1 - lal) ae caf enon 


Therefore, by hypothesis in (ii), for & < 6, 


r= 


& PUse < 2/h) S$ 53 ae i i=7w < 


and transience obtains by B. 


Corotuary 1. Jf, for some 6 > 0, 
5 


J, Fa - ” 
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then the random walk ts recurrent. 

Corotiary 2. If EX = 0 then the random walk ts recurrent. 
Follows by elementary computations from the fact that, given e > 0, 
EX = OimpliesO S$ 1 — Ref(u) < ex for |u| < 6 sufficiently small. 


26.3. Fluctuations; exponential identities. We consider random 
variables defined on (5S, - + -, Sn), say, the number of its positive terms 
or their maximum or times of occurrence of this maximum, etc. We 
shall find the explicit form of their laws in terms of ‘“‘exponential identi- 
ties.” The method will be Fourier analytic. At its core lies a ““Wiener- 
Hopf” factorization technique for the generating characteristic 1/(1 — #f) 
of the random walk (So, Si, ---). 


In what follows, 0 < ¢ < 1, u € R, A denotes a linear Borel set, and 
we set 


oo yn ; 
(ut) = exh mf zw, 
Fa(u, t) p r- eae") 
hence 
ut) ‘sf f ~s,} 
¢(u,t) = ex > em ns 
Fa 5) p Li [s,€ A‘) 


a. FACTORIZATION LEMMA. 
1 c 
1 — #f(u) = fal, t)fa°(us¢). 


Results from 


1 1 a 
i—f@~ *P {leg (; - 7a) ~ op 5 EY “w} 


(4) — fens, — [ews +f ers 
A AS 


We shall be dealing with Fourier-Stieltjes transforms of functions of 
bounded variation on linear Borel sets, of the form 


by 


pu) = J ew dG(«) 
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with same affixes for p and G, if any. Exactly as for characteristic func- 
tions, the uniqueness theorem p & G (up to additive constants) is valid, 
their products pp’ correspond to compositions G*G’ and, clearly, their 
sums and differences p + 9’ are transforms of functions of bounded varia- 
tionG+G’. 


Let 
Pa(u,t) = Y Prue”, Qar(u, t) = YO gn(u)e” 
n=0 n=0 
where Po(u) = go(u) = 1 and, for” 2 1, 


Prlu) = i dG, («), gn(u) = i dG,,(*). 
A A® 
A. UNIQUE FACTORIZATION THEOREM. If 


(i) ay = Pals Duels 1 o7 Gi) AEE = Quel 


then 


Pau, t) = fa(u, ¢) or Pau, t) = fa (y, £) 
and 


Oac(u, t) = far(u, ¢). 

Proof. Because of a, it suffices to show that if the foregoing relations 
hold for P4(u, #) and Qae(u, #) then, for 2 = 0, 1,---+, pr(u) = p,(u) 
and gn(u“) = gn(u), uC R. 

Upon identifying the coefficients of the ¢”, (4) and (ii) then become, 
respectively, 


(1) E paledgnv(u) = & piledan ae 
or 
(2) E pulidghn(u) = ¥ vilw)a-a(o. 


We proceed by induction: The assertion is trivially true for ~ = 0. 
If p.(u) = pr(u) and gx(u) = g,(u) for Rk = 1, +++, 2 — 1 then in (1) 
and (2) the first 7 — 1 terms in the left and right sums coincide so that 
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Dalt) + gn(u) = Palu) + gn(u) or Pa(u) + gn(u) = pa(u) + gn(u). 
Thus 
Pr(u) — Pr(u) = gn(u) — gn(u) or Pa(@) — Pal“) = gn(u) — gn(u), 
that is, foru € R, 


i) ee dH («) = — J ee d?T,(™) or fetus dH, («) = i) eur d(x), 
A AS A A | 


where the functions H, = G, — G’ are of bounded variation. There- 
fore, by the uniqueness property for Fourier-Stieltjes transforms, 
IadH, = = I4cdH,, so that both sides vanish. The assertion follows. 


This proof as well as B are due to Baxter. 


From now on, to simplify the writing, J eSn = H(eSnT(4)) will 


be denoted by E(e“S*; 4) and, when 4 is of the form [---] we shall 
omit the square brackets. The first visit time of 4 by (51, Se,---) will 
be called hitting time of A. When7is arandom time, fort = ©,7"+7 = 0, 
(0<t<1),2 =0,1,---; note that if 7 is a time of (S51, Se,+--) then 
[; = 0] = 9. 


B. RANDOM TIMES IDENTITIES. Let rt be a time of (Si, S2,-+-). 


(i) The following identities hold: 


E(te™St) = SE (e™Sn; r = nn), 
n=0 


n=0 n=0 


1 — Eves) _ — snpiuSn 
i— fu) = £(S 


tT—1 co 
E (z ress) = > ir h(e™Sa; 7 > 2) 


n=0 


(ii) When +r = 1, ts hitting time of A then 


1 — E(t exp iuS,,) = faa, 4) = on 5 — E (ese; Sn & a 


n=1 


n=0 


tT,71 oo yn 
B( x re) = furl, 1) = of P E(e8e Sy E a} 
n=1 
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Proof. The first identity in (i) results at once from the definitions. 
The second one results from 


n=0 n=1 k=0 n=0 


T—1 foe) n—1 foe) 
E (z res) => E (z tre tSns 7 = : =) E(tte™Sa:7 > x), 


‘Since 7 is a time of the random walk (81, Ss, ++ -), 


E >, {me tuSn _ E [te tuSr >, e tu(Sr+tn—Szr) 
n=? n=(0 


— E(tte™Sr) E ( ress) — E(ete™Sr) /(1 — tf(u)) 


n=0 


and, replacing in 
t—l oo 
1/1 —f(4)) =E ( ress) +E ( res) ; 
n=0 n=T 
the third identity obtains. By the unique factorization theorem, it im- 


plies the two identities in (11). 


Our main concern is with 4 = (0, ~) hence 4° = (— ~, O], and we 
set f+ = fo0,0))f— = /(—, 9 so that 


f+(u, t) = =P — E(e™Sa: Sy > ob 
n=] 


f-(, t) = onl — Bese; S,< of 


Corouiary. If tr = 700, «) then 


t—l 
1 — E(éteS) = fi(u, 2), AE rest) = f_(u, 2). 
u=0 


C. MAxIMA TIMES AND POSITIVE SUMS IDENTITIES. 


(1) Lf tn 15 the time of occurrence of the first maximum of (So, Si, °° +5 Sn) 
then 


E(etss; ™n = k) = E(eiSa; Tk = k) E(e*S,_x: Tn-—k = 0) 3 


= PE (eS: ry = 2) = f(s), 


n—0 
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foe) 


dy Mm E(e™Sn: th = 0) = f_(ts!), 


u=0 
> PE (stneSn) = fi (usd fu), O<s 1. 
u=0 
(ii) If vn is the number of positive sums in (So, Siy+ ++, Sn) then the 


above tdentities remain valid when therein + is replaced by v with same 
subscripts. 


(111) 41 above identities, including those in the above Corollary, remain 
valid when (O, ~) ts replaced by [O, ©) provided Tn 15 the time of the last 
maximum of (So, S1,° + °) Sn) Gnd vn is the number of nonnegative sums in 
(So, Si,- °°, Sn) while Sy, > 0 and S, 0 are replaced by S, 20 and 
Sn <O in fy and in f_, respectively. 


Proof. The identities in (i) are based upon a “sample space factoriza- 
tion”: If M7, = max(So, - - -, Sn) then the first time this maximum occurs 
ist, = minfjOS k Sun: S, = M,} and, by the very definition of 7, = 
™(X1, a) Xn) 

[72(X, mt ty Xn) = kl = [7.(X4, sty Xz) = A] ltn—-e(Xagiy * 0's Xn) = Ol. 


Since the last two events are independent and so are S, and S, — S;, while 
S, — S, has the same distribution as S,_x, it follows that 


E(e™Sa: 7, = k) = E(e™Sk: t, = k) +» E(e™Sn-*: 7,_, = 0). 


Thus, upon multiplying by s*¢- and summing overOS kSu< ~, 
> in Ee(stneuSn) = Plu, st) O(u, 2) 
u=0 
where 
P(u, t) = >, nH (eSa; 7, = n), 
QO(u, t) = >, i"E(eSn: 7, = 0). 
For s = 1, the preceding relation becomes 


io = P(u, t)O(u, t), 
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while 7, = 2 implies S, > 0 and 7, = 0 implies S, S 0. The unique 
factorization theorem applies so that 


P(u, ry) = fi(u, t), O(u, t) = f_(u, t) 
and the identities in (1) follow. 


By Andersen equivalence, the sample space factorization for the times 
of first maxima is equivalent to the far from obvious sample space fac- 
torization for the numbers of positive sums: 


[pn( X14, mt fy Xn) = k] = (ye(X4, my Xt) = Rk] [pn—e( Xn, m8 fy Xn) = Oj, 
and (ii) for positive sums identities follows. 


Finally, by using in the unique factorization theorem [0, ©) in lieu of 
(0, ~), (11) results from the fact that all the foregoing arguments con- 
tinue to apply to the corresponding 7, and vy. 


The following important identity, known in various guises and with 
various degrees of generality, has its origin in the basic Spitzer identity 
below (Pollaczec, Spitzer, Kemperman, Port, etc.). 


D. MaxIMuM TIME AND VALUE IDENTITY. Jf M, = max(So,---, Sn) 
and7t, = minf0O Sk San: S, = M,}, then 


joe) 


> i” Fe( S MetuSntivMn) = fi(u + V, st) f_(u, f), O<s5 < 1. 


n=0 
Proof. Since t, = kM, = Sy, by sample space factorization, 
E(etuSntioMns 7, = k) = E(eiet)SetiuSn—Se), 7, = ) 
= E(etutysk: 7, = k)- E(eSn-k: 7,_, = 0). 
Upon multiplying by s*¢" and summing for0 S$ k S12 < o, it follows 
that 


[oe] 


DE (stneiuSn) = P(u + 0, st)O(u, A), 


n=0 
where P and Q are the functions introduced in the preceding proof and, 
as therein, the unique factorization theorem yields the asserted identity. 
Particular cases. 1°. For v = 0 we obtain the last identity in C(i). 
2°. For s = I and u = 0, changing 9 into uw, we obtain the Pol/aczec- 
Spitzer identity: 


2 ee (eitn) = op 2, — E (¢is*) } 


n=0 
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It was first discovered by Pollaczec but remained unnoticed until re- 
discovered by Spitzer. 


3°. For s = 1, interchanging w and v, we obtain 
[oe] 


2D, iE (eiuMntivSn) = fi(u + 9, Af_(a, 2). 


n=0 


Upon setting w = uw +0 then changing w into uw, and v into —g, it be- 
comes 


Ze, ME(eHamt MoS) = F(a, )f_(—2, 2), 


Finally, upon multiplying by 


1 = 2” 2" | 
oe a , a < 
= exp E, 7 P(S, > 0) exp, - P(S, S&S 0} 
we obtain 


—_ py jr E(etuMn+iv(Mn-Sn)) — exp E, - ( EeivSnt 4+ Beso) 


n= 


or 


2 a a | 
2X, "Ee Matto Mn-Sn)) _— exp, —(Eeinsn® + Feet’Sn —_— i, 
U= 


n= 
and this is the basic Spitzer identity in its initial form. 


EXTENSION. The basic exponential factors f, (wu, f) and /_ (u, #) may 
still have meaning when uw € R is replaced by complex z. In fact, 


Cc 


F+(z, 4) = oxp{ — E(e*S.: §, > o)} 


n=! 


is bounded and continuous for 3z = 0 and regular for 3z > 0, 


f(t) = exp) Blt: Sy 0} 


n=l 


is bounded and continuous for 3z < 0 and regular for 3z < 0. 

Thus the question arises whether the identities so far obtained remain 
valid for such z. The answer is in the affirmative for those identities in 
which figure only either f+ or f-; when both occur then, clearly, we must 
have 3z = 0, that is, z = u€ R. These assertions result at once from 
the unicity lemma 15.2d, which yields (1) and (ii) below, while for (aii) 
we also use the fact that all the r.v.’s therein are nonnegative. 
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E. EXTENDED IDENTITIES. The following identities are valid: 
(i) For 3z = 0, 
1— E (ge***) = fi (z, 2d), 
= iE (eS: tr, = n) = f(z, A), 


ZX, MECC: my = 2) = fal 2) 
(ii) For 5z < 0, 


T—1 


XE (eS) = f_(, 2) 
y iE (et*Sn: 7, = 0) = f(z, A) 


d i" E(e#Sn; y, = 0) = f_(z, 4) 


n=0 


(iii) For 3z = 0, 3z’ 2 0, 


> jn (e%2Mntte! (Mn—Sn)) 


t= 


= exp, ~(E(e*S0*) + E(et’Sn~) — i} 


and, in particular, for 3z = 0, 
De rE (e%*Mn) — exp{E, —E cst, 


Remark. In fact, the argument used for the unicity lemma 15.2d 
permits to prove simultaneously identities and a unique factorization 
theorem (Pollaczec, Ray, Kemperman). To fix the ideas, replace u by z 
in P(u, t) and Q(u, ¢) used in the proof of C: 


P(z,t) = > iE (e*Sa;7, = n), O(2z,¢) = > i"E(e*#Sn; 7, = 0) 


n=0 n=0 


Note that P(z, f) like f(z, 2) (Q(z, 2) like f_(z, 2)) is bounded and continu- 
ous for 3z 2 0 (Sz S 0) and regular for 3z > 0 (Sz < 0) while for 3z = 0 


Flt, f-(e,8) = 7x = PH NQG). 
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Therefore, for 3z = 0, 


P(t) _ f-@#) 
z)= = 
g( I+; t) O(z, t) 
where the first (second) ratio is bounded and continuous for 3z = 0 
(3z S 0) and regular for 3z > O (Sz <0). Thus, the two ratios are re- 
strictions of a same bounded entire function g(z) to 3z =O and to 
3z <0, respectively. By Liouville’s theorem, g(z) is a constant. But 


P(2, t) 
S+(8, t) 


—-1l as z-+0 
so that 
P(z,t) = f(z, ) for 3z = 0, O(z,2) = f_(z, £) for 3z S 0. 


This proves the corresponding extended identities together with unique 
factorization. 


All preceding identities in; and f_ which are in terms of exponentials, 
naturally, are called exponential identities. Their striking and unex- 
pected feature is that the distributions of various fluctuation random 
variables are in terms of individual terms S, of the random walk. The 
sample space factorizations 


[tn( Xa, + + +> Xn) = Rk] = [te M4, + + +) Xe) = A] [tan (Xess + + +, Xn) = O] 


and the equivalent one with 7 replaced by »v are, naturally, called extreme 
factorizations. Their striking and unexpected feature is that the dis- 
tributions of 7, and of v, are determined by the pr.’s of their extreme 
values 0 and xz. 


26.4 Fluctuations; asymptotic behaviour. We relate now the asymp- 
totic behaviour of the random walk to that of fluctuations r.v.’s Ta, Ta, 
Yny M,; 4 denotes a linear Borel set. 


a. HITTING TIME LEMMA. If 7, 15 hitting time of A then 


(i) 1—Era = exp —E ~ P(S, € 4 


n=l 


(ji) P(r,= 0) = exp{ —¥ P(Sn © A/n\ 


n=] 


(ii) Er, = exp{¥ P(Sa 4)/n\ 4 © P(r = @)(o-0=0). 


n=1 


[Sec. 26] INDEPENDENT IDENTICALLY DISTRIBUTED SUMMANDS 397 


Proof. We use the elementary proposition: If the a, 2 Oand > ant” 


n=0 


converges for 0 < ¢ < 1, then > a,l7—-> Di a, S © ast T 1. 


n=0 n=0 


Set 7 = 7,4. Identity (i) results from the first one in 26.3B(ii) with 
u=0Q. Identity (ii) follows from (i) by letting ¢ T 1 in 


Et = P(r = 2) > > P(r = 2) = P(r < @&) 
n=1 n=1 
so that 
P(r = ~)+1-Ekr- exp —E P(Sn Aja}, 
n=1 


Identity (it) results from the second one in 26.3B(i) with u = 0, by 
letting ¢ T 1 so that 


fore) tT—l roe) co 
exp P(Sn & ay/n} HEY Ph = VPP > 2n) > P(r >n) 
n=0 n=0 n=0 n=0 
and 


Er=SPlr> 2) + @-P(r= oo), 


n=0 


b. FINITE INTERVAL LEMMA. Let J be a finite interval and let r be the 
hitting time of J°. Then Er? < © forr> 0, and ES, = Er- EX exists 
(and is finite) 1f and only if EX exists and is finite. 


The first assertion is Stein’s lemma and the second one is Wald’s rela- 
tion, both obtained before general fluctuation theory. 


Proof. The second relation was proved in 26.1 and it remains to prove 
the first one. To fix the ideas, let J = [a, 4]. Since the only asymptotic 
alternatives are: a.s. 3, > — © or to +o or —@ = liminf S, < limsup 
Sn = +, there is an integer m such that p(| Sn | S 2— a4) <1. But 
[r > 2+ m] implies occurrence of independent events [r >] and 
[| Sntm — Sn| S4— a], where Spim — Sn has the same distribution 
as Sm. Therefore, 


P(r >n+m) S pP(t>2n) 
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and, by induction, 


P(r > km) < pk =1,2,---. 


Therefore, P(r > 2) < (p'/")" so that the series )* #*P(r > 2) < & for 


n=0 
\t] <t = po" > 1. The first assertion follows. 


A. TRANSLATION INVARIANCE THEOREM. 


(i) If J ts a finite interval then > P(Sn € J)/n < @. 


n=1 


Gi) 5 P(S, = )/n < ©, P(Sa <2 +E P(Sa > D/n = @, 


n=1 n=l] n=1 


forallx © R. 


(itt) Either P(S, < «)/n < © forallx CR 
Or P(Sn < *)/n = © forallx € R, 

where “‘<”’ stands for any one of the following inequality signs: “<”’, 
“<”, “>”, “<>”, 

(iv) If 72 = t,4, 45 the hitting time of Az where A, stands for any one of 
the following intervals: (x, ©), [w, ©), (— ©, x], (— ©, ), then 

P(tz < ©) = P(m < &) 

foralx ER. 


In particular, P(tz < ©) = 1 if and only if ¥ P(Sn € Ah)/n = ©. 


n=1 
Proof. Assertion (i) results from a(ili) and b(@). Assertion (11) results 


from (1) and the fact that the sum of the three series in (ii) is )> 1/m = @. 


n=1 


Assertion (iii) for, say, ““S”’ and x > 0, results from [0, ©) = [0, «) + 


[x, ©) by 


> P(Sn > 0)/n = > P(Sn € (0, «))/n + > P(S, 2 x)/n 


n=1 n=) n=] 


where, by (i), the second series converges; similarly for the other choices 
of “<” andx CR. Finally, assertion (iv) follows, by a(ii), from (iii). 
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B. THREE ALTERNATIVES CRITERIA. 


(ai). The following properties are equivalent: 


Sa —© 4.5, P(t@,0) < ©) < 1, © P(Sn > 0)/n < ©, EX < 0 


n=1 
when EX exists. 


(ac). The following properties are equivalent: 


Sp +0 a.5., P(r¢-00) < ©) <1, © P(S, < 0)/n < ~, EX >0 


n=1 
when EX extsts. 
(a3). The following properties are equivalent: 


—o = liminf S, < limsup 8, = +© as., P(t@, 0.) < ©) = 1 and 


P(tT(-00) < ©) = 1, YE PSn > 0)/n = © and Y) P(Sn <0) = », 


n=] n=1 
EX =0 when EX exists. 


Proof. Assertions in (a3) follow upon excluding the only two other 
alternatives (a1) and (ae). Assertions in (ag) result from those in (a) by 
changing X into —.X hence every S, into —S,. Thus, it suffices to prove 
those in (a). 

If P(S,— —©) = 1, we cannot have P(ro0,0) < ©) = 1 for then, 
by A(iv), P(t(2,0) < ©) =1 for « as large as we wish hence 
limsup S, = © as. Thus P(t0,0) < ©) < 1, by a(ii), is equivalent to 


>, P(S, > 0)/n, and the first three properties in (a1) are equivalent. 


n=1 


Finally, by 26.2C Corollary, when EX exists then S, “3 —o = EX <0. 
The proof is terminated. 


Corotiary. P(limsup S, = +”) =O or 1 according as (i) 
P(r0,0) < ©) <lor =1, (ii) P(S, > 0)/n = © or <~™, (iii) EX <0 
or EX = O when EX exists. 


C. ASYMPTOTIC BEHAVIOUR THEOREM. 
Gi) If P(S, > 0)/n = © then M, 3 ©, 75 ©, 7,75 @, 
(ii) If PS, <0)/n < @ then as. M, = Ma with id. chf. 
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Eee = exp4 (EeiuSnt — dink, 


n=l 


Vn —> Vooy Tn —> Ta With common generating function 


n=] 


Ere = Efe = exp (¢” — 1)P(S, > 0)/n} 


Note that the hypotheses in (1) and (ii) being contrary of each other, 
are equivalent to their conclusions. 


Proof. By the above Corollary )> P(S, > 0)/n = @ is equivalent to 


n=l 


limsup S, = + as. hence M,. = sup S,*+ = limsup S, = +o as. It 
follows that . 


Pay = Vn + 1 1.0.) = P(tn = nN 1.0.) =] 


so that v, —3 © andr, -5 ©. Assertions (i) are proved. 


By the same Corollary, 5° P(S, > 0)/n < @ is equivalent to limsup 


n=l] 
Sn < © a.s., in fact, tolim S, = —© as. But, by definition of limsup, 
limsup S, < © a.s.implies M@, T Ma < © a.s. and P(vny, ¥ vn 1.0.) = 
P(tn41 ¥ Tn 1.0.) = O hence rn “3 vay < © and tr “S Ta << ©. 
We use now the classical Abel theorem: If the complex a, — a finite 
then (1 — ¢) }) ant” > aasi T 1. 


n=] 
Since M, T Ma < © a.s., He™!n — KeMe hence, by Pollaczec-Spitzer 
identity, as¢ T 1, 


EetMe — (1 — 2) ¥ Eoin 


n=) 


exp{ as n\exp4¥ rFe(etuSnt) (a 


exp{ > t?( Be iusat — D/n\ — exp (Eeisnt — Dia}, 


n=] n=] 
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The last limit is an i.d. ch.f., since it is a product of i.d. ch.f.’s eva with 


¥,(u) = 1 f (ei — 1) dF s +(x). 
The first assertion in (ii) is proved. 


Since P(vy, = k) = P(ta = k) fork = 0,1,--+,nandn = 1, 2,---, 
Yn > vand tT, —> To < © imply that Pv. = k) = Plro = k) fork = 0, 
1,---. Thus, to find the generating function of 7.. it suffices to find that 


of Yo: Ett? = > t*P(v. =k). Since mn veq < &, it follows, by ex- 
k=0 


treme factorization, that 


Pv. =k) — Pn = k) = Py, = 1) Pn = 0) > Py, = £)P(Vm = 0). 
But, by a(i1), 


PQ = 0) = P(ro,0) = ©) = exp) P(S, > 0)/n\ 


n=l 


while, by 26.3C(ii) and the second relation in (i) therein with 4 = 0, 


> tkP(, = k) = expd i*P(S, > 0)/n}. 


k=0 n=1 


Therefore, 


Ete = > t!P(y, = k)P(v. = 0) = expd (¢” — 1)P(S, > o/n}, 


n=] 


and the proof is terminated. 

This basic Spitzer theorem has the same striking and unexpected fea- 
ture as the exponential identities: The limit distributions are in terms 
of individual sums S,. 


COMPLEMENTS AND DETAILS 


As throughout this chapter, X1, Xo, +++ are iid summands with common non- 
degenerate law £(X), d.f. F, ch.f. f, and Sg = 0, Sn = Xi +--+ + Xn. Slowly 
varying functions will be denoted by A(x) with or without affixes. 

I. Let Fy, k = 1, 2, be d.f.’s and let x > &. 

If 1 — Fy(*) ~ «7h, (x) then 1 — (Fi*F2) (¥) ~ «7@(hi (x) + he(x)). 

If 1 — F(x) ~ x-*h(«) then 1 — Fs,(«) ~ nx-¢h(x). Deduce similar propo- 
sitions for F,(—x*), F(—x). 
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2. Extrema. Let Yn = max Xx. 

1SkSn 

If P(X < c) = 1 for some constant c, then £(Y,) > £(c). 

If P(X < «) < 1 for every x € R then there exist scale factors 4, > 0 such 
that £(Y,/bn) — &(Y) nondegenerate if and only if 1 — F(«*) varies regularly 
with exponent @ < 0, and then Fy(x) = 0 or e-* with c > 0, according as 
x<QOorx>0. What about Z, = min X;? 

1SkSn 

3. Let S be a (A, f)-compound Poisson: fg = e&F-). Let x o&. 

If 1 — F(x) ~ «7¢h(x) then 1 — F(x”) ~ Ax7h(x). Is there a similar propo- 
sition about F(—«) and F’s(—«)? 

4, Let F be an i.d. d.f. with f = e’, y = (a, B’, ZL). Letx— o. 

If L(x) = x«-#h(x) then 1 — F(x) ~ L(x). Is there a similar proposition 
about L(—«*) and F(—x)? 

5, Norming. Let &(X) be attracted by a nondegenerate stable £,,0 < y S 2, 
that is, £(S,/b, — an) — &, for suitable 4, > 0 and a,. 


an, 
(a) Let p(t) = f x°dF (x) and q(t) = 1 — F(«) + F(—*). Let t— o and 
—t 
use 25.1.D. 
mf 
Ifr < y then oa) Jel < | x |"dF(x) > 


If r > y when y < 2 then “le | « |dF(x) ~ 


2-7 


* 


YT 


y ttq(t). Deduce that 
r— ¥ 
E| X |" < © forr < yand E| X |" = & forr > y when y < 2. 

(b) Centering constants. If O< y<1wecantakea,=0. Ifl<y< 2, 
we can take a, = EX: Use (a). 

(c) Scale factors. All suitable scale factors 6, are of the form 4, = nVVA(n): 
Use |fe(u/bn)| = e~/“/7(1 + o(1)), replace m by wk then 1/dn, by (4n/bnz)/Ony 
note that o(1) — 0 uniformly in every given finite interval, show that if the se- 
quence (bn/bnx) is not bounded then e~** = 1—impossible, and finally dn./bn > 
Ruy, 

6. Standard domains of attraction. We say that £(X) belongs to the standard 
domain of attraction of a nondegenerate stable £,if 5, = bn/7 > Oare suitable 
scale factors. (The usual but confusing term is “‘normal” not “standard.’’) 

£(X) belongs to the standard domain of attraction of a nondegenerate stable 
&, with 0 < 7 < 2, if and only if, asw— ©, x7(1 — F(x ))— dcp and x7 F(—x) 
— b¥cq,c > 0,p~,¢ 2 0. 

&(X) belongs to the standard domain of attraction of 91(0,1) if and only if 
EX? < o, and then J, = o”/2 with ¢ = cX. 

7. Estimates for E|S,|. Let £(X) with EX = 0 belong to the standard domain 
of attraction of £,(Y) with 1 < y < 2. 

(a) &(S,/nV7) > &(Y), F(—x) S cx7y and 1 — F(x) S cx77 for some con- 
stant c > 0. 

(b) There is a positive a independent of ” such that for x 2 xo independent of 
n, P(|Sal/n1 > x) S a/x?. 

(c) For 0 Sr < ¥ there is a positive J = J(r) independent of such that 
E(\S./n¥7|*) S 6. 
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(d) E(S,/n"1) > EY and E(|S,/n"7|" = E|Y| for 0 Sr < y. 

8. Partial attraction. &(X) is said to belong to the domain of partial attraction 
of a nondegenerate £(Y) if there is a subsequence (k,) of integers such that, for 
suitable 4, > 0 and an, £(9%,/bn — an) > &(Y). It is a property of types of 
laws. Discuss the propositions below in whichever order is preferred. 

(a) Every £(X) belongs to the domain of partial attraction of either no type 
or of one type or of an uncountable family of types. 

(b) If £CX) belongs to the domain of partial attraction of only one type, then 
this type is stable. 

(c) Asymmetric distribution with slowly varying two-sided tail belongs to no 
domain of partial attraction. 

(d) If f belongs to the domain of attraction of an 1.d. e¥ so does the i.d. ef 4, 
An i.d. law need not belong to its own domain of partial attraction: Use the first 
statement and (c). 

(e) If £(X) is partially attracted by £(Y) which is partially attracted by £(Z) 
then £(X) is partially attracted by £(Z). The domain of partial attraction of a 
stable law is strictly larger than its domain of attraction. 


(f) Let i.d. fn = e% have bounded yn. Set ¢(u) = x Wn(bnu)/kn. There are 
n=1 


b, > 0 and integers k, — © such that knd(u/dn) — va(u) 20, u CR. 

(g) If fis partially attracted by i.d. e¥" — e¥ then it is partially attracted by 
e’. Isi.d. property of the e¥, ey needed? 

(h) Every i.d. f = e¥ has a nonempty partial domain of attraction: Note 
that there are compound Poisson e¥" > f, and use lim e*?(@/an) = lim evn™ = f, 


(i) Lévy example: f = e¥ with y(w) = 2 >> 2-*(cos2*u — 1) isi.d. Find its 
k= ~~ 


Lévy function. Show that /2"(u) = /(2"u); fis not stable but partially attracts 
itself. 

(j) Every sequence of i.d. laws has an i.d. law belonging to the domain of par- 
tial attraction of each of its terms. 

(k) Doblin universal laws. There are i.d. laws belonging to the domain of par- 
tial attraction of every i.d. law. Consider the countably many i.d. laws—ordered 
into a sequence e%:, e¥2,+--, whose Lévy functions are purely discontinuous 
with only rational discontinuities and only rational jumps, every i.d. e¥ is limit 
of a subsequence of (e¥»), and use (/). 

9. Consider random walk on lattices with, to simplify, span 1. 

(a) Such a random walk forms a constant Markov chain with a countable 
number of states. What is its transition matrix? 

(b) Interpret the concepts and results in the Introductory Part III in terms 
of those in §26. 

(c) Discuss the Introductory Part CDIII in terms of §26 and complete it. 

10. (a) A truly two-dimensional random walk with zero expectations and 
finite variances is recurrent. 

(b) A truly three-dimensional random walk is always transient. What about 
m-dimensional random walks with m > 3? 

For (a) and (b) use ch.f.’s analogously to the one-dimensional case in 26.2. 


404 INDEPENDENT IDENTICALLY DISTRIBUTED SUMMANDS [Sec. 26] 


Il. ES, Let EX = Oand o? = &®X (S~&). 

(a) E[S,|/n/? = a for some constant @ and all m. In fact, E|S,|/n¥/? = 
QES*t/nil2 — oV2/ x. 

(b) Let 4 = (—~-—, 0] or (0,0). 

o<o = ES, and ES, , are both finite, and then 


o 1 
ES.,- Seen Y- PSV at 


12. Arcsine law. (a) Complete the computations in the proof of Arcsine law 
in 26.1. | 


Loo) 


(b) Let ¢ = >> (: — P(S, > 0) be finite. Then 


n=1 
Pv, =0)~ ee/V 281, Pin = n) = eV UN 


and the Arcsine law holds. 
(c) Andersen and Spitzer generalizations. Let an = P(Sn < 0). 


(aq, ++--+an)/n-aS&(l — va/n) 9 &(Y) 
with £(Y) = £L(1) fora = 0, L(Y) = L(0) for 2 = 1 and, for0 <a < 1, 


: v 
PY <y= sinze [ x—a(1 — x)e-ldx; 
0 


if (a4, +---+-+ a,)/n does not converge then £(1 — v,/m) does not converge. 
(Andersen case: dna.) If a = 1/2, &(Y) is Arcsine law. 

Use Kemperman’s recurrence relation: Let 4,(k) = E(m — v,)*; b,(0) = 1, 
bl) =n — (a +--++ an), o(k) = 0 for k = 1,2,---. Then 


n—l 


blk +1) = nba(k) — Xo an—mbn(&). 


m=0 


When (a41+---+ a,)/n—>a then (1 —»,/n)—(Y) with EY* = (1 — a) 
(1 — a/2)--- (1 — a/k); apply Ch. IV,CD/0 (Spitzer). 

13. Identities and limit distributions. Let vn, V'ny ¥n, ¥'n be respectively the 
number of positive nonnegative, negative, nonpositive sums in (So, +++, Sn). 
Let tn, 7’ny Tny Tn be respectively the time of occurrence of the first maximum M,, 


the last maximum M,, the first minimum M,, the last minimum M, of 
(So, vey Sn). 

(a) The equivalence relation P(v,n = k) = P(tn = k) remains valid if same 
affixes above are added simultaneously to v and to 7; similarly for E(e"8": vp, = 
k) = E(™S": +, = k) and, more generally, for E(fn: vn = &) = E(fat ta = &) in 
26.1. What about extreme factorizations? 

(b) Which exponential identities in 26.3 and results in 26.4 remain valid or 
have to be modified accordingly when the same affixes are added? 
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14. Ranked sums (order statistics). Order the sums as follows: S;(w) precedes 
S;(w) if Si(w) < S;(w) or Si(w) = Sj(w) buti < 7. Forevery k = 0,-+-, 2, let 
Rixg(w) be the kth from the bottom of So(w), +++ , Sn(w) according to this order- 
ing. Let rnz(w) be the index of corresponding S;(w), that is, Raz(w) = S;j(w) = 
tnk(w) = S;(w). Note that Rano S +++ S Ran, Rao = Mn is the first minimum 
occurring at time 7n0 = 7, and Ran = M, is the last maximum occurring at time 
Van = Vn. 


Discuss the following Wendel identities: 


Eis’nk = Es" kr . Es’n-k0, 
Fete Rank = Fei: . EeteMn-x, 
E(eSntioRnk) = E(eiuSetivMn) r) E(et¥Sn-% tivMn-k), 
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of a functional, 79 
Hilbert, 80 
of a mapping, 79 


Normal 
approximation theorem, 300 
convergence criterion, 307 
decomposition theorem, 283 
law, 213, 281 
type, 283 
Normalized distribution function, 
199 
Normed 
functional, 80 
linear space, 79 
measure, 91, 151 
sums, 331 
Nowhere dense, 75 
Null 
set, 91, 112 
state, 32 
Numerical function, 105 


Open 
covering, 69 
set, 66 
Ordering, partial, 67 
Orthogonal random variables, 246 
Outcome(s), 4 
of an experiment, 4 
field of, 4 
Outer 
extension, 89 
measure, 88 
Owen, 411 


Parseval relation, 386 
Parzen, 407 
Petrov, 410 
Physical statistics, 42 
Planck, 44 
Poincaré recurrence theorem, 28 
Poisson 
compound, 347 
convergence criterion, 229, 329 
decomposition theorem, 283 
law, 282 
theorem, 15 
type, 283 
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Pollaczec, 396, 394, 400, 411 Quadratic mean 

—Spitzer identity, 393, 400 convergence in, 260 
Pollard, 34 
Polya, 368, 409 Radon-Nikodym theorem, 133 
Port, 369, 393, 412 extension, 134 
Positive Raikov, 283, 411 

part, 105 Random 

state, 32 event, 5, 8 


function, 152, 156 


Positivity criterion, 33 sequence, 152, 155 
Possible time, 375 
state(s), 370 


time identities, 390 


value(s), 370 time translations, 376 
values theorem, 371 trial, 6 


Probability, 5, 8, 16, 91, 151, 152 variable, 6, 9, 17, 152 


conditional, 6, 24 vector, 152, 155 
convergence in, 153 walk, 47, 378, 379 
convergence on metric spaces, Range, 63 
189, 190 space, 62, 105 

distribution, 168 Ranked 
field, 8 random variables, 350 
law, 214 sums, 405 
product—theorem, 92 Ray, 369, 395, 412 
rule, total, 24 Real 
stability in, 244 line, 93 
sub, 187 line, extended, 93, 107 
transition, 29 number, 93 

Probability space, 91, 151, 152 number extended, 93 
induced, 168 space, 93 
product, 92 Recurrence, 380 
transition, 29 criterion, 32 
product, 92 theorem(s), 27, 384 
sample, 168 Recurrent state(s), 31, 380 

Product walk, 28 
cylinder, 62 Regular variation, 354 
field, 61, 62 criterion, 354 
measurable space, 61, 62, 137 Relative compactness, 190 
measure, 136 theorem, 195 
probability, 92 Representation theorem, 313 
probability theorem, 242 integral, 166 
scalar, 80 Restriction, 88 
set, 61 Return 
o-field, 61, 62 criterion, 32 
space, 61, 62 state, 31 


Prohorov, 190, 193, 264, 409 Riemann integral, 129 


Riemann-Stieltjes integral, 129 
Riesz, F., 222, 
rth 
absolute moment, 157, 186 
mean, 159 
Ruin, gambler’s, 48 


Saks, 408 
Savage, 374 
Scalar product, 80 
Scheffé, 408 
Schwarz inequality, 158 
Section, 61, 62, 135 
Self-decomposable(bility), 334 
criterion, 335 
Separable space, 68 
Separation theorem, 68 
Sequence(s) 
convergence equivalent, 245 
random, 152, 155 
tail of, 241 
tail equivalent, 245 
Series criterion 
three, 249 
two, 263 
Set function 
additive, 83 
continuous, 85 
countably additive, 83 
finite, 82, 111 
finitely, 83 
a-additive, 83, 111 
o-finite, 83, 11 
Set(s) 
Borel, 93, 104 
bounded, 74 
closed, 66 
compact, 69 
dense, 72 
directed, 68 
empty, 4, 54 
Lebesgue, 129 
measurable, 60, 64, 107 
null, 91, 112 
open, 66 
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product, 61 
subdirected, 69 
totally bounded, 75 
Shohat, 187 
g-additive, 83, 111 
o-field(s), 59 
compound, 156, 235 
independent, 236 
induced, 64 
product, 61, 62 
tail, 241 
Signed measure, 87 
Simple 
function, 64, 107 
random variable, 6, 152 
Snell, 66 
Space 
adjoint, 81 
Banach, 79 
Borel, 93, 107 
compact, 69 
complete, 74 
Hausdorff, 68 
Hilbert, 80 
induced probability, 168 
linear, 79 
measurable, 60, 64, 107 
measure, 112 
metric, 73 
metric linear, 79 
normal, 78 
normed linear, 79 
probability, 91, 151, 152 
product, 61, 62 


product measurable, 61, 62, 137 


product measure, 136 
product probability, 91 
range, 62, 105 
sample probability, 168 
separated, 68 
of sets, 55 
topological, 66 

Sphere, 73 
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Spitzer, 369, 393, 394, 404, 410, 
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basic identity, 396 
basic theorem, 401 
Stability 
almost sure, 244 
almost sure criterion, 264 
and attraction criterion, 364 
in probability, 244, 246 
Stable 
characteristic function, 338 
law, 338, 363 
State(s) 
closed class of, 36 
equivalent, 36 
everreturn, 36 
indecomposable class of, 36 
nonrecurrent, 31, 380 
no return, 31 
null, 32 
period of, 33 
positive, 32 
possible, 370 
recurrent, 31, 380 
return, 31 
transient, 380 
Stationary chain, 39 
Steinhaus, 409 
Stieltjes, 128, 129 
Stochastic variable, 174 
Stochastically independent, 11 
Strong law of large numbers 
Borel, 18, 19, 26, 244 
Kolmogorov, 241 
Structure 
corollary, 348 
theorem, 310 
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linear, 79 
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Sum of sets, 4, 51 
Superior limit, 58 
Supremum, 56, 103 
Sure 
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event, 4, 151 
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event, 241 
function, 241 
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o-field, 241 
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theorem, 287 
Tight(ness), 194 
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theorem, 194 
Three 
alternatives, 375 
alternatives criteria, 399 
series criterion, 249 
Toeplitz lemma, 250 
Topological 
space, 66 
subspace, 66 
Topology, 66 
metric, 73 
reduced, 66 
Total(ly) 
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probability rule, 24 
variation, 87 
Transition probability, 29 
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deterministic, 5 
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random, 6 
repeated, 5, 6 
Triangle property, 73 
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characteristic function, 386 
probability density, 386 
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inequality, 209 
Tulcea, 138, 408 
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Two-series criterion, 251 
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convergence of, 216 
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Poisson, 282 
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314 
continuity, 77 
convergence, 114 
convergence theorem, 204 
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class, 272 
variation, 87 
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theorem, 371 
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limit theorem, 305 
Variation 
lower, 87 
regular, 354 
slow, 354 
total, 87 
of truncated moments, 359 
upper, 87 
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convergence, 180 
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symmetrization inequalities, 257 
convergence, to a pr., 190 
Weierstrass theorem, 5 
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criterion, Borel, 24 
law, Kolmogorov, 241 
law, Hewitt-Savage, 374 
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